DSCGIRSep 9, 2015

Practical and Optimal LSH for Angular Distance

arXiv:1509.02897v1549 citations
Originality Highly original
AI Analysis

This work addresses the need for efficient similarity search in high-dimensional data, such as in machine learning and information retrieval, by providing a practical and near-optimal solution for angular distance, which is incremental over prior theoretical methods.

The paper tackles the problem of approximate near neighbor search for angular distance by introducing a practical Locality-Sensitive Hashing (LSH) family that achieves asymptotically optimal running time, improving upon hyperplane LSH in practice with experimental validation on real and synthetic datasets.

We show the existence of a Locality-Sensitive Hashing (LSH) family for the angular distance that yields an approximate Near Neighbor Search algorithm with the asymptotically optimal running time exponent. Unlike earlier algorithms with this property (e.g., Spherical LSH [Andoni, Indyk, Nguyen, Razenshteyn 2014], [Andoni, Razenshteyn 2015]), our algorithm is also practical, improving upon the well-studied hyperplane LSH [Charikar, 2002] in practice. We also introduce a multiprobe version of this algorithm, and conduct experimental evaluation on real and synthetic data sets. We complement the above positive results with a fine-grained lower bound for the quality of any LSH family for angular distance. Our lower bound implies that the above LSH family exhibits a trade-off between evaluation time and quality that is close to optimal for a natural class of LSH functions.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes