Hashing for Similarity Search: A Survey
It provides a comprehensive overview for researchers and practitioners working on approximate nearest neighbor search, but it is incremental as it synthesizes existing work without new results.
The paper surveys hashing methods for similarity search, categorizing them into locality sensitive hashing and learning to hash, and reviews aspects like hash function design and search schemes.
Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this paper, we present a survey on one of the main solutions, hashing, which has been widely studied since the pioneering work locality sensitive hashing. We divide the hashing algorithms two main categories: locality sensitive hashing, which designs hash functions without exploring the data distribution and learning to hash, which learns hash functions according the data distribution, and review them from various aspects, including hash function design and distance measure and search scheme in the hash coding space.