LGIRApr 7, 2023

Similarity search in the blink of an eye with compressed indices

arXiv:2304.04759v264 citationsh-index: 15
Originality Highly original
AI Analysis

This work addresses the bottleneck of random-access memory patterns in graph-based indices for similarity search, offering significant speed and memory improvements for applications handling large-scale vector data.

The paper tackles the problem of similarity search for billion-scale vectors by introducing a novel compression method, Locally-adaptive Vector Quantization (LVQ), combined with a new computing system, achieving up to 20.7x higher throughput and up to 3x memory reduction compared to alternatives.

Nowadays, data is represented by vectors. Retrieving those vectors, among millions and billions, that are similar to a given query is a ubiquitous problem, known as similarity search, of relevance for a wide range of applications. Graph-based indices are currently the best performing techniques for billion-scale similarity search. However, their random-access memory pattern presents challenges to realize their full potential. In this work, we present new techniques and systems for creating faster and smaller graph-based indices. To this end, we introduce a novel vector compression method, Locally-adaptive Vector Quantization (LVQ), that uses per-vector scaling and scalar quantization to improve search performance with fast similarity computations and a reduced effective bandwidth, while decreasing memory footprint and barely impacting accuracy. LVQ, when combined with a new high-performance computing system for graph-based similarity search, establishes the new state of the art in terms of performance and memory footprint. For billions of vectors, LVQ outcompetes the second-best alternatives: (1) in the low-memory regime, by up to 20.7x in throughput with up to a 3x memory footprint reduction, and (2) in the high-throughput regime by 5.8x with 1.4x less memory.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes