Accelerating Large-Scale Inference with Anisotropic Vector Quantization
This work addresses the challenge of efficient large-scale inference for applications like recommendation systems, though it appears incremental as it builds on existing quantization techniques.
The paper tackles the problem of scaling maximum inner product search for large databases by introducing anisotropic quantization loss functions that penalize errors in the parallel component of residuals more heavily, achieving state-of-the-art results on public benchmarks.
Quantization based techniques are the current state-of-the-art for scaling maximum inner product search to massive databases. Traditional approaches to quantization aim to minimize the reconstruction error of the database points. Based on the observation that for a given query, the database points that have the largest inner products are more relevant, we develop a family of anisotropic quantization loss functions. Under natural statistical assumptions, we show that quantization with these loss functions leads to a new variant of vector quantization that more greatly penalizes the parallel component of a datapoint's residual relative to its orthogonal component. The proposed approach achieves state-of-the-art results on the public benchmarks available at \url{ann-benchmarks.com}.