DSIRLGSep 29, 2025

Efficient Sketching and Nearest Neighbor Search Algorithms for Sparse Vector Sets

arXiv:2509.24815v13 citationsh-index: 24
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficient nearest neighbor search for sparse embeddings, which is important for applications like information retrieval and machine learning, though it appears incremental as it builds on existing ANNS methods.

The authors tackled the problem of approximate nearest neighbor search for sparse vectors by introducing novel data structures and algorithms, resulting in their method Seismic achieving sub-millisecond per-query latency with high accuracy on a large-scale benchmark dataset using a single CPU.

Sparse embeddings of data form an attractive class due to their inherent interpretability: Every dimension is tied to a term in some vocabulary, making it easy to visually decipher the latent space. Sparsity, however, poses unique challenges for Approximate Nearest Neighbor Search (ANNS) which finds, from a collection of vectors, the k vectors closest to a query. To encourage research on this underexplored topic, sparse ANNS featured prominently in a BigANN Challenge at NeurIPS 2023, where approximate algorithms were evaluated on large benchmark datasets by throughput and accuracy. In this work, we introduce a set of novel data structures and algorithmic methods, a combination of which leads to an elegant, effective, and highly efficient solution to sparse ANNS. Our contributions range from a theoretically-grounded sketching algorithm for sparse vectors to reduce their effective dimensionality while preserving inner product-induced ranks; a geometric organization of the inverted index; and the blending of local and global information to improve the efficiency and efficacy of ANNS. Empirically, our final algorithm, dubbed Seismic, reaches sub-millisecond per-query latency with high accuracy on a large-scale benchmark dataset using a single CPU.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes