DBApr 3

Distance Comparison Operations Are Not Silver Bullets in Vector Similarity Search: A Benchmark Study on Their Merits and Limits

arXiv:2604.0280154.2h-index: 2
AI Analysis

This work addresses the readiness of recent DCO methods for production vector database systems, highlighting their limitations and incremental improvements.

The study benchmarked 8 Distance Comparison Operation (DCO) algorithms across 10 datasets with up to 100M vectors and 12,288 dimensions, finding that their efficiency is highly sensitive to data dimensionality, degrades under out-of-distribution queries, and is unstable across hardware, though they can accelerate index construction and data updates.

Distance Comparison Operations (DCOs), which decide whether the distance between a data vector and a query is within a threshold, are a critical performance bottleneck in vector similarity search. Recent DCO methods that avoid full-dimensional distance computations promise significant speedups, but their readiness for production vector database systems remains an open question. To address this, we conduct a comprehensive benchmark of 8 DCO algorithms across 10 datasets (with up to 100M vectors and 12,288 dimensions) and diverse hardware configurations (CPUs with/without SIMD, and GPUs). Our study reveals that these methods are not silver bullets: their efficiency is highly sensitive to data dimensionality, degrades under out-of-distribution queries, and is unstable across hardware. Yet, our evaluation also demonstrates often-overlooked merits: they can accelerate index construction and data updates. Despite these benefits, their unstable performance, which can be slower than a full-dimensional scan, leads us to conclude that recent algorithmic advancements in DCO are not yet ready for production deployment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes