DBMar 26

TaCo: Data-adaptive and Query-aware Subspace Collision for High-dimensional Approximate Nearest Neighbor Search

arXiv:2603.2491986.61 citationsh-index: 57
AI Analysis

This work addresses efficiency issues in high-dimensional similarity search for applications like databases and machine learning, representing an incremental improvement over the subspace collision framework.

The paper tackles the problem of imbalanced index construction and wasted query overhead in the subspace collision framework for high-dimensional approximate nearest neighbor search by introducing a data-adaptive and query-aware method called TaCo, which achieves up to 8x speedup in indexing, reduces memory footprint to 0.6x, and provides over 1.5x query throughput compared to state-of-the-art methods.

Approximate Nearest Neighbor Search (ANNS) in high-dimensional Euclidean spaces is a fundamental problem with broad applications. Subspace Collision is a newly proposed ANNS framework that provides a novel paradigm for similarity search and achieves superior indexing and query performance. However, the subspace collision framework remains data-agnostic and query-oblivious, resulting in imbalanced index construction and wasted query overhead. In this paper, we address these limitations from two aspects: first, we design a subspace-oriented data transformation mechanism by averaging the entropies computed over each subspace of the transformed data, which ensures balanced subspace partitioning (in an information theoretical sense) and enables data-adaptive subspace collision; second, we present query-aware and scalable query strategies that dynamically allocate overhead for each query and accelerate collision probing within subspaces. Building on these ideas, we propose a novel data-adaptive and query-aware subspace collision method, abbreviated as TaCo, which achieves efficient and accurate ANN search while maintaining an excellent balance between indexing and query performance. Extensive experiments on real-world datasets demonstrate that, when compared to state-of-the-art subspace collision methods, TaCo achieves up to 8x speedup in indexing and reduces to 0.6x memory footprint, while achieving over 1.5x query throughput. Moreover, TaCo achieves state-of-the-art indexing performance and provides an effective balance between indexing and query efficiency, even when compared with advanced methods beyond the subspace-collision paradigm. This paper was published in SIGMOD 2026.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes