Optimistic Query Routing in Clustering-based Approximate Maximum Inner Product Search
This work addresses a specific bottleneck in approximate nearest neighbor search for applications like recommendation systems, offering significant efficiency gains, though it is incremental as it builds on existing clustering-based methods.
The paper tackles the problem of efficiently identifying which shards to probe in clustering-based maximum inner product search, a bottleneck that has received little attention, and presents an algorithm that achieves the same accuracy as state-of-the-art routers while probing up to 50% fewer points on benchmark datasets.
Clustering-based nearest neighbor search is an effective method in which points are partitioned into geometric shards to form an index, with only a few shards searched during query processing to find a set of top-$k$ vectors. Even though the search efficacy is heavily influenced by the algorithm that identifies the shards to probe, it has received little attention in the literature. This work bridges that gap by studying routing in clustering-based maximum inner product search. We unpack existing routers and notice the surprising contribution of optimism. We then take a page from the sequential decision making literature and formalize that insight following the principle of ``optimism in the face of uncertainty.'' In particular, we present a framework that incorporates the moments of the distribution of inner products within each shard to estimate the maximum inner product. We then present an instance of our algorithm that uses only the first two moments to reach the same accuracy as state-of-the-art routers such as ScaNN by probing up to $50\%$ fewer points on benchmark datasets. Our algorithm is also space-efficient: we design a sketch of the second moment whose size is independent of the number of points and requires $\mathcal{O}(1)$ vectors per shard.