DSCVDBAug 13, 2014

Hashing for Similarity Search: A Survey

arXiv:1408.2927v1576 citations
Originality Synthesis-oriented
AI Analysis

It provides a comprehensive overview for researchers and practitioners working on approximate nearest neighbor search, but it is incremental as it synthesizes existing work without new results.

The paper surveys hashing methods for similarity search, categorizing them into locality sensitive hashing and learning to hash, and reviews aspects like hash function design and search schemes.

Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this paper, we present a survey on one of the main solutions, hashing, which has been widely studied since the pioneering work locality sensitive hashing. We divide the hashing algorithms two main categories: locality sensitive hashing, which designs hash functions without exploring the data distribution and learning to hash, which learns hash functions according the data distribution, and review them from various aspects, including hash function design and distance measure and search scheme in the hash coding space.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes