Approximate Nearest Neighbor Search with Window Filters
This addresses a practical bottleneck in semantic search systems where users need to filter results by metadata like timestamps or costs, though it appears to be an incremental improvement over existing ANN methods.
The paper tackles the problem of approximate nearest neighbor search with numeric label filters (window search), which is relevant for semantic search applications like timestamp-filtered image/document search. They propose a modular tree-based framework that transforms traditional ANN indexes into window search structures, achieving up to 75× speedup over existing solutions while maintaining recall.
We define and investigate the problem of $\textit{c-approximate window search}$: approximate nearest neighbor search where each point in the dataset has a numeric label, and the goal is to find nearest neighbors to queries within arbitrary label ranges. Many semantic search problems, such as image and document search with timestamp filters, or product search with cost filters, are natural examples of this problem. We propose and theoretically analyze a modular tree-based framework for transforming an index that solves the traditional c-approximate nearest neighbor problem into a data structure that solves window search. On standard nearest neighbor benchmark datasets equipped with random label values, adversarially constructed embeddings, and image search embeddings with real timestamps, we obtain up to a $75\times$ speedup over existing solutions at the same level of recall.