Memory vectors for similarity search in high-dimensional spaces
This addresses the computational bottleneck in large-scale similarity search for applications like image retrieval, though it is an incremental improvement on existing indexing methods.
The paper tackles the problem of fast similarity search in high-dimensional vector databases by proposing an indexing architecture using memory units with representative vectors, which significantly speeds up search compared to exhaustive methods without compromising quality, with provable complexity reductions in high dimensions.
We study an indexing architecture to store and search in a database of high-dimensional vectors from the perspective of statistical signal processing and decision theory. This architecture is composed of several memory units, each of which summarizes a fraction of the database by a single representative vector. The potential similarity of the query to one of the vectors stored in the memory unit is gauged by a simple correlation with the memory unit's representative vector. This representative optimizes the test of the following hypothesis: the query is independent from any vector in the memory unit vs. the query is a simple perturbation of one of the stored vectors. Compared to exhaustive search, our approach finds the most similar database vectors significantly faster without a noticeable reduction in search quality. Interestingly, the reduction of complexity is provably better in high-dimensional spaces. We empirically demonstrate its practical interest in a large-scale image search scenario with off-the-shelf state-of-the-art descriptors.