Wenxuan Xia

44.6DBMay 28

E2E: Efficient Filtered AKNN Search via Adaptive Termination

Wenxuan Xia, Mingyu Yang, Wentao Li et al.

Approximate k-Nearest Neighbor (AKNN) search is widely used in vector databases. When vectors carry additional attributes (e.g., labels or numerical values), filtered AKNN search retrieves the nearest vectors to a query vector under attribute constraints. Most existing methods use a fixed termination condition, searching the entire index while respecting attribute filters. However, this leads to substantial redundant computations, since different queries require different amounts of search effort, and thus misses early termination opportunities for easy queries. This paper proposes a lightweight model to estimate the search cost of filtered AKNN queries and enable adaptive termination: For easy queries, the search stops early to reduce latency, while for hard queries, it continues longer to preserve accuracy. The key challenge is accurate cost prediction under attribute filters. To address this, we show that information collected during an early probing phase (e.g., attribute distributions and intermediate distance statistics) can effectively predict the overall search cost. Experiments on six real-world datasets demonstrate 1.1-3.7 speedup over state-of-the-art baselines at 95% recall, while maintaining search accuracy.

60.5DBJun 2

HRNN: A Hybrid Graph Index for Approximate Reverse k-Nearest Neighbor Search on High-Dimensional Vectors

Wenxuan Xia, Mingyu Yang, Wentao Li et al.

Reverse k-nearest neighbor (RkNN) search returns all data points that regard a query vector as one of their k-nearest neighbors (kNNs). Existing RkNN methods typically follow a filter-and-verification framework: vectors near the query vector are first collected as candidates and then verified against their kNN-radius (i.e., the distance to their k-th nearest neighbor). However, existing methods face two key limitations in high-dimensional spaces. First, nearby vectors often do not belong to the query's true RkNN set, resulting in excessive candidate expansion overhead. Second, existing methods compute kNN-radius online during verification, incurring substantial query-processing cost. To address these limitations, we propose HRNN, a hybrid graph index for approximate RkNN search. (1) Rather than directly treating nearby vectors as RkNN candidates, HRNN uses them as proxy points based on the assumption that a query's RkNN results can often be discovered through the RkNN results of its nearby vectors. (2) To reduce verification cost, HRNN materializes high-fidelity kNN-radius offline, eliminating expensive online reconstruction while preserving accuracy. HRNN combines a navigation graph, a ranked KNN graph, and reverse-neighbor lists into a hybrid index that supports efficient proxy retrieval, candidate generation, and kNN-radius access. We also develop efficient index construction and append-only maintenance algorithms. Extensive experiments show that HRNN consistently outperforms existing methods, achieving up to one order of magnitude higher throughput. Moreover, HRNN scales to datasets containing up to 10 million high-dimensional vectors while supporting efficient dynamic index maintenance.

Wenxuan Xia

2 Papers