LG MLSep 24, 2018

Norm-Ranging LSH for Maximum Inner Product Search

Xiao Yan, Jinfeng Li, Xinyan Dai, Hongzhi Chen, James Cheng

arXiv:1809.08782v29.161 citationsHas Code

Originality Highly original

AI Analysis

This work addresses a bottleneck in hashing-based MIPS methods for applications like recommendation systems, offering a significant performance improvement over the state-of-the-art.

The paper tackles the problem of long tails in 2-norm distributions degrading the performance of Simple-LSH for maximum inner product search, proposing Norm-ranging LSH which partitions datasets into sub-datasets to reduce query time complexity and achieves an order of magnitude speedup over Simple-LSH for the same recall.

Neyshabur and Srebro proposed Simple-LSH, which is the state-of-the-art hashing method for maximum inner product search (MIPS) with performance guarantee. We found that the performance of Simple-LSH, in both theory and practice, suffers from long tails in the 2-norm distribution of real datasets. We propose Norm-ranging LSH, which addresses the excessive normalization problem caused by long tails in Simple-LSH by partitioning a dataset into multiple sub-datasets and building a hash index for each sub-dataset independently. We prove that Norm-ranging LSH has lower query time complexity than Simple-LSH. We also show that the idea of partitioning the dataset can improve other hashing based methods for MIPS. To support efficient query processing on the hash indexes of the sub-datasets, a novel similarity metric is formulated. Experiments show that Norm-ranging LSH achieves an order of magnitude speedup over Simple-LSH for the same recall, thus significantly benefiting applications that involve MIPS.

View on arXiv PDF Code

Similar