LGIROct 22, 2022

OOD-DiskANN: Efficient and Scalable Graph ANNS for Out-of-Distribution Queries

arXiv:2211.12850v242 citationsh-index: 26
Originality Incremental advance
AI Analysis

This addresses the problem of inefficient ANNS for OOD queries, which is crucial for applications like cross-modal retrieval, but the approach is incremental as it builds on existing graph-based methods.

The paper tackles the performance degradation of Approximate Nearest Neighbor Search (ANNS) algorithms when queries are out-of-distribution (OOD) compared to in-distribution (ID), and presents OOD-DiskANN, which uses a small sample of OOD queries to achieve up to 40% improvement in mean query latency over state-of-the-art methods.

State-of-the-art algorithms for Approximate Nearest Neighbor Search (ANNS) such as DiskANN, FAISS-IVF, and HNSW build data dependent indices that offer substantially better accuracy and search efficiency over data-agnostic indices by overfitting to the index data distribution. When the query data is drawn from a different distribution - e.g., when index represents image embeddings and query represents textual embeddings - such algorithms lose much of this performance advantage. On a variety of datasets, for a fixed recall target, latency is worse by an order of magnitude or more for Out-Of-Distribution (OOD) queries as compared to In-Distribution (ID) queries. The question we address in this work is whether ANNS algorithms can be made efficient for OOD queries if the index construction is given access to a small sample set of these queries. We answer positively by presenting OOD-DiskANN, which uses a sparing sample (1% of index set size) of OOD queries, and provides up to 40% improvement in mean query latency over SoTA algorithms of a similar memory footprint. OOD-DiskANN is scalable and has the efficiency of graph-based ANNS indices. Some of our contributions can improve query efficiency for ID queries as well.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes