Work Sharing and Offloading for Efficient Approximate Threshold-based Vector Join
This work addresses performance bottlenecks in vector and vector-relational databases used for multimodal retrieval and semantic analytics, representing an incremental improvement over existing methods.
The paper tackled the problem of inefficient approximate vector joins in database systems by proposing a unified framework with soft work sharing, a merged index, and adaptive hybrid search, achieving substantial improvements in efficiency-recall trade-off on eight datasets.
Vector joins - finding all vector pairs between a set of query and data vectors whose distances are below a given threshold - are fundamental to modern vector and vector-relational database systems that power multimodal retrieval and semantic analytics. Existing state-of-the-art approach exploits work sharing among similar queries but still suffers from redundant index traversals and excessive distance computations. We propose a unified framework for efficient approximate vector joins that (1) introduces soft work sharing to reuse traversal results beyond the join results of previous queries, (2) builds a merged index over both query and data vectors to further speedup graph explorations, and (3) improves robustness for out-of-distribution queries through an adaptive hybrid search strategy. Experiments on eight datasets demonstrate substantial improvements in efficiency-recall trade-off over the state of the art.