IROct 31, 2016

Off the Beaten Path: Let's Replace Term-Based Retrieval with k-NN Search

arXiv:1610.10001v166 citations
Originality Incremental advance
AI Analysis

This addresses inefficiencies in retrieval systems for users needing more effective and efficient search, though it is incremental as it builds on existing k-NN methods.

The paper tackles the problem of vocabulary mismatch in retrieval pipelines by replacing term-based search with approximate k-NN retrieval, achieving nearly two orders of magnitude faster speed with only a small accuracy loss.

Retrieval pipelines commonly rely on a term-based search to obtain candidate records, which are subsequently re-ranked. Some candidates are missed by this approach, e.g., due to a vocabulary mismatch. We address this issue by replacing the term-based search with a generic k-NN retrieval algorithm, where a similarity function can take into account subtle term associations. While an exact brute-force k-NN search using this similarity function is slow, we demonstrate that an approximate algorithm can be nearly two orders of magnitude faster at the expense of only a small loss in accuracy. A retrieval pipeline using an approximate k-NN search can be more effective and efficient than the term-based pipeline. This opens up new possibilities for designing effective retrieval pipelines. Our software (including data-generating code) and derivative data based on the Stack Overflow collection is available online.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes