LGMLSep 24, 2018

Norm-Ranging LSH for Maximum Inner Product Search

arXiv:1809.08782v261 citations
Originality Highly original
AI Analysis

This work addresses a bottleneck in hashing-based MIPS methods for applications like recommendation systems, offering a significant performance improvement over the state-of-the-art.

The paper tackles the problem of long tails in 2-norm distributions degrading the performance of Simple-LSH for maximum inner product search, proposing Norm-ranging LSH which partitions datasets into sub-datasets to reduce query time complexity and achieves an order of magnitude speedup over Simple-LSH for the same recall.

Neyshabur and Srebro proposed Simple-LSH, which is the state-of-the-art hashing method for maximum inner product search (MIPS) with performance guarantee. We found that the performance of Simple-LSH, in both theory and practice, suffers from long tails in the 2-norm distribution of real datasets. We propose Norm-ranging LSH, which addresses the excessive normalization problem caused by long tails in Simple-LSH by partitioning a dataset into multiple sub-datasets and building a hash index for each sub-dataset independently. We prove that Norm-ranging LSH has lower query time complexity than Simple-LSH. We also show that the idea of partitioning the dataset can improve other hashing based methods for MIPS. To support efficient query processing on the hash indexes of the sub-datasets, a novel similarity metric is formulated. Experiments show that Norm-ranging LSH achieves an order of magnitude speedup over Simple-LSH for the same recall, thus significantly benefiting applications that involve MIPS.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes