IRMay 20, 2021

FreshDiskANN: A Fast and Accurate Graph-Based ANN Index for Streaming Similarity Search

arXiv:2105.09613v1113 citations
Originality Highly original
AI Analysis

This addresses the need for fresh indices in real-world applications like document or news indexing, where incremental updates are crucial but previously expensive.

The paper tackles the problem of real-time updates in graph-based approximate nearest neighbor search indices, which are typically static and require costly periodic rebuilding. It presents FreshDiskANN, a system that supports thousands of concurrent inserts, deletes, and searches per second while maintaining over 95% recall, reducing maintenance costs by 5-10x compared to existing methods.

Approximate nearest neighbor search (ANNS) is a fundamental building block in information retrieval with graph-based indices being the current state-of-the-art and widely used in the industry. Recent advances in graph-based indices have made it possible to index and search billion-point datasets with high recall and millisecond-level latency on a single commodity machine with an SSD. However, existing graph algorithms for ANNS support only static indices that cannot reflect real-time changes to the corpus required by many key real-world scenarios (e.g. index of sentences in documents, email, or a news index). To overcome this drawback, the current industry practice for manifesting updates into such indices is to periodically re-build these indices, which can be prohibitively expensive. In this paper, we present the first graph-based ANNS index that reflects corpus updates into the index in real-time without compromising on search performance. Using update rules for this index, we design FreshDiskANN, a system that can index over a billion points on a workstation with an SSD and limited memory, and support thousands of concurrent real-time inserts, deletes and searches per second each, while retaining $>95\%$ 5-recall@5. This represents a 5-10x reduction in the cost of maintaining freshness in indices when compared to existing methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes