On the Needs for Rotations in Hypercubic Quantization Hashing
This provides theoretical guarantees for a widely used family of hashing methods, addressing a fundamental issue in efficient similarity search for applications like information retrieval, though it is incremental as it builds on existing experimental findings.
The paper tackles the problem of improving nearest neighbor search accuracy in hypercubic quantization hashing by proving that applying rotations after dimensionality reduction is optimal under mild assumptions, showing that optimal binary sketches require uniformizing the covariance matrix diagonal and bounding dissimilarity probability by initial distance.
The aim of this paper is to endow the well-known family of hypercubic quantization hashing methods with theoretical guarantees. In hypercubic quantization, applying a suitable (random or learned) rotation after dimensionality reduction has been experimentally shown to improve the results accuracy in the nearest neighbors search problem. We prove in this paper that the use of these rotations is optimal under some mild assumptions: getting optimal binary sketches is equivalent to applying a rotation uniformizing the diagonal of the covariance matrix between data points. Moreover, for two closed points, the probability to have dissimilar binary sketches is upper bounded by a factor of the initial distance between the data points. Relaxing these assumptions, we obtain a general concentration result for random matrices. We also provide some experiments illustrating these theoretical points and compare a set of algorithms in both the batch and online settings.