DBApr 3

Distance Comparison Operations Are Not Silver Bullets in Vector Similarity Search: A Benchmark Study on Their Merits and Limits

Zhuanglin Zheng, Yuxiang Zeng, Chenchen Liu, Yunzhen Chi, Binhan Yang, Yongxin Tong

arXiv:2604.0280154.2h-index: 2

AI Analysis

This work addresses the readiness of recent DCO methods for production vector database systems, highlighting their limitations and incremental improvements.

The study benchmarked 8 Distance Comparison Operation (DCO) algorithms across 10 datasets with up to 100M vectors and 12,288 dimensions, finding that their efficiency is highly sensitive to data dimensionality, degrades under out-of-distribution queries, and is unstable across hardware, though they can accelerate index construction and data updates.

Distance Comparison Operations (DCOs), which decide whether the distance between a data vector and a query is within a threshold, are a critical performance bottleneck in vector similarity search. Recent DCO methods that avoid full-dimensional distance computations promise significant speedups, but their readiness for production vector database systems remains an open question. To address this, we conduct a comprehensive benchmark of 8 DCO algorithms across 10 datasets (with up to 100M vectors and 12,288 dimensions) and diverse hardware configurations (CPUs with/without SIMD, and GPUs). Our study reveals that these methods are not silver bullets: their efficiency is highly sensitive to data dimensionality, degrades under out-of-distribution queries, and is unstable across hardware. Yet, our evaluation also demonstrates often-overlooked merits: they can accelerate index construction and data updates. Despite these benefits, their unstable performance, which can be slower than a full-dimensional scan, leads us to conclude that recent algorithmic advancements in DCO are not yet ready for production deployment.

View on arXiv PDF

Similar