DBApr 2

Towards Robustness: A Critique of Current Vector Database Assessments

arXiv:2507.0037910.41 citationsh-index: 7
Predicted impact top 24% in DB · last 90 daysOriginality Incremental advance
AI Analysis

This addresses a critical issue for users and researchers in AI systems who rely on vector databases, by providing a more robust evaluation metric to improve real-world performance, though it is incremental as it builds on existing benchmarks.

The paper tackles the problem of evaluating vector databases by showing that relying on average recall is problematic because it hides variability across queries, leading to underperformance on hard queries and failures in downstream applications like RAG. They propose Robustness-δ@K, a new metric that captures the fraction of queries with recall above a threshold, and integrate it into benchmarks to reveal significant robustness differences among vector indexes, with more robust ones yielding better application performance even with the same average recall.

Vector databases are critical infrastructure in AI systems, and average recall is the dominant metric for their evaluation. Both users and researchers rely on it to choose and optimize their systems. We show that relying on average recall is problematic. It hides variability across queries, allowing systems with strong mean performance to underperform significantly on hard queries. These tail cases confuse users and can lead to failure in downstream applications such as RAG. We argue that robustness consistently achieving acceptable recall across queries is crucial to vector database evaluation. We propose Robustness-$δ$@K, a new metric that captures the fraction of queries with recall above a threshold $δ$. This metric offers a deeper view of recall distribution, helps vector index selection regarding application needs, and guides the optimization of tail performance. We integrate Robustness-$δ$@K into existing benchmarks and evaluate mainstream vector indexes, revealing significant robustness differences. More robust vector indexes yield better application performance, even with the same average recall. We also identify design factors that influence robustness, providing guidance for improving real-world performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes