IR LGMay 6, 2024

Adaptive Retrieval and Scalable Indexing for k-NN Search with Cross-Encoders

Nishant Yadav, Nicholas Monath, Manzil Zaheer, Rob Fergus, Andrew McCallum

arXiv:2405.03651v12.22 citationsICLR

Originality Incremental advance

AI Analysis

This addresses the challenge of scalable and accurate retrieval for search systems, offering a practical solution for domains where cross-encoders are too slow, though it is incremental as it builds on existing factorization techniques.

The paper tackles the problem of efficient k-NN search using cross-encoders by proposing a sparse-matrix factorization method that approximates CE scores with fewer CE calls, improving recall by up to 5% (k=1) and 54% (k=100) over dual-encoder approaches and achieving speedups of up to 100x over CUR-based methods.

Cross-encoder (CE) models which compute similarity by jointly encoding a query-item pair perform better than embedding-based models (dual-encoders) at estimating query-item relevance. Existing approaches perform k-NN search with CE by approximating the CE similarity with a vector embedding space fit either with dual-encoders (DE) or CUR matrix factorization. DE-based retrieve-and-rerank approaches suffer from poor recall on new domains and the retrieval with DE is decoupled from the CE. While CUR-based approaches can be more accurate than the DE-based approach, they require a prohibitively large number of CE calls to compute item embeddings, thus making it impractical for deployment at scale. In this paper, we address these shortcomings with our proposed sparse-matrix factorization based method that efficiently computes latent query and item embeddings to approximate CE scores and performs k-NN search with the approximate CE similarity. We compute item embeddings offline by factorizing a sparse matrix containing query-item CE scores for a set of train queries. Our method produces a high-quality approximation while requiring only a fraction of CE calls as compared to CUR-based methods, and allows for leveraging DE to initialize the embedding space while avoiding compute- and resource-intensive finetuning of DE via distillation. At test time, the item embeddings remain fixed and retrieval occurs over rounds, alternating between a) estimating the test query embedding by minimizing error in approximating CE scores of items retrieved thus far, and b) using the updated test query embedding for retrieving more items. Our k-NN search method improves recall by up to 5% (k=1) and 54% (k=100) over DE-based approaches. Additionally, our indexing approach achieves a speedup of up to 100x over CUR-based and 5x over DE distillation methods, while matching or improving k-NN search recall over baselines.

View on arXiv PDF

Similar