IRAIApr 22

Semantic Recall for Vector Search

arXiv:2604.2041758.8h-index: 6
Predicted impact top 58% in IR · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses a specific issue in vector search evaluation for researchers and practitioners, offering incremental improvements in metric design.

The paper tackles the problem of evaluating approximate nearest neighbor search algorithms by introducing Semantic Recall, a metric that focuses on retrieving only semantically relevant objects, and shows it improves assessment of retrieval quality and leads to better cost-quality tradeoffs.

We introduce Semantic Recall, a novel metric to assess the quality of approximate nearest neighbor search algorithms by considering only semantically relevant objects that are theoretically retrievable via exact nearest neighbor search. Unlike traditional recall, semantic recall does not penalize algorithms for failing to retrieve objects that are semantically irrelevant to the query, even if those objects are among their nearest neighbors. We demonstrate that semantic recall is particularly useful for assessing retrieval quality on queries that have few relevant results among their nearest neighbors-a scenario we uncover to be common within embedding datasets. Additionally, we introduce Tolerant Recall, a proxy metric that approximates semantic recall when semantically relevant objects cannot be identified. We empirically show that our metrics are more effective indicators of retrieval quality, and that optimizing search algorithms for these metrics can lead to improved cost-quality tradeoffs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes