DBDSIRFeb 8, 2016

Scalability and Total Recall with Fast CoveringLSH

arXiv:1602.02620v21 citations
AI Analysis

This addresses the need for precise performance guarantees in similarity search applications, offering a practical solution with improved efficiency.

The paper tackles the problem of false negatives in locality-sensitive hashing (LSH) for similarity search by proposing Fast CoveringLSH (fcLSH), which eliminates false negatives and achieves an asymptotic improvement in hash function computation time from O(dL) to O(d + L log L). Experiments show it is comparable or superior to traditional methods for search radii up to 20 in high-dimensional Hamming space.

Locality-sensitive hashing (LSH) has emerged as the dominant algorithmic technique for similarity search with strong performance guarantees in high-dimensional spaces. A drawback of traditional LSH schemes is that they may have \emph{false negatives}, i.e., the recall is less than 100\%. This limits the applicability of LSH in settings requiring precise performance guarantees. Building on the recent theoretical "CoveringLSH" construction that eliminates false negatives, we propose a fast and practical covering LSH scheme for Hamming space called \emph{Fast CoveringLSH (fcLSH)}. Inheriting the design benefits of CoveringLSH our method avoids false negatives and always reports all near neighbors. Compared to CoveringLSH we achieve an asymptotic improvement to the hash function computation time from $\mathcal{O}(dL)$ to $\mathcal{O}(d + L\log{L})$, where $d$ is the dimensionality of data and $L$ is the number of hash tables. Our experiments on synthetic and real-world data sets demonstrate that \emph{fcLSH} is comparable (and often superior) to traditional hashing-based approaches for search radius up to 20 in high-dimensional Hamming space.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes