DSCGLGOct 29, 2023

Worst-case Performance of Popular Approximate Nearest Neighbor Search Implementations: Guarantees and Limitations

arXiv:2310.19126v143 citationsh-index: 5
Originality Incremental advance
AI Analysis

This provides theoretical guarantees and limitations for practitioners using these popular ANN search tools, revealing potential performance pitfalls in real-world applications.

The paper analyzes the worst-case performance of popular graph-based approximate nearest neighbor search implementations (HNSW, NSG, DiskANN), showing that DiskANN's slow preprocessing version provably achieves constant approximation ratio and poly-logarithmic query time on bounded-dimension data, while other variants can require linear query time (e.g., at least 0.1n steps for DiskANN to find any of the 5 nearest neighbors).

Graph-based approaches to nearest neighbor search are popular and powerful tools for handling large datasets in practice, but they have limited theoretical guarantees. We study the worst-case performance of recent graph-based approximate nearest neighbor search algorithms, such as HNSW, NSG and DiskANN. For DiskANN, we show that its "slow preprocessing" version provably supports approximate nearest neighbor search query with constant approximation ratio and poly-logarithmic query time, on data sets with bounded "intrinsic" dimension. For the other data structure variants studied, including DiskANN with "fast preprocessing", HNSW and NSG, we present a family of instances on which the empirical query time required to achieve a "reasonable" accuracy is linear in instance size. For example, for DiskANN, we show that the query procedure can take at least $0.1 n$ steps on instances of size $n$ before it encounters any of the $5$ nearest neighbors of the query.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes