CVDBMar 16, 2024

Vector search with small radiuses

arXiv:2403.10746v11 citationsh-index: 48
Originality Incremental advance
AI Analysis

This addresses the need for accurate range search in systems like image matching, though it is incremental as it focuses on evaluation and optimization within existing indexing frameworks.

The paper tackles the problem of evaluating vector search for range queries, where all vectors within a radius are retrieved, by proposing a new metric called RSM that is principled and easy to compute, showing that top-k retrieval methods are suboptimal for this task.

In recent years, the dominant accuracy metric for vector search is the recall of a result list of fixed size (top-k retrieval), considering as ground truth the exact vector retrieval results. Although convenient to compute, this metric is distantly related to the end-to-end accuracy of a full system that integrates vector search. In this paper we focus on the common case where a hard decision needs to be taken depending on the vector retrieval results, for example, deciding whether a query image matches a database image or not. We solve this as a range search task, where all vectors within a certain radius from the query are returned. We show that the value of a range search result can be modeled rigorously based on the query-to-vector distance. This yields a metric for range search, RSM, that is both principled and easy to compute without running an end-to-end evaluation. We apply this metric to the case of image retrieval. We show that indexing methods that are adapted for top-k retrieval do not necessarily maximize the RSM. In particular, for inverted file based indexes, we show that visiting a limited set of clusters and encoding vectors compactly yields near optimal results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes