CVAIDSIRJul 11, 2013

Fast Exact Search in Hamming Space with Multi-Index Hashing

arXiv:1307.2982v3170 citations
Originality Highly original
AI Analysis

This addresses the need for fast and exact search in large-scale binary code datasets, such as in image retrieval, with a novel approach that overcomes prior limitations.

The paper tackles the problem of exact nearest neighbor search in Hamming space for long binary codes, which was previously thought ineffective, by introducing a multi-index hashing method that enables efficient search with sub-linear runtime and demonstrates dramatic speedups on datasets up to one billion codes.

There is growing interest in representing image data and feature descriptors using compact binary codes for fast near neighbor search. Although binary codes are motivated by their use as direct indices (addresses) into a hash table, codes longer than 32 bits are not being used as such, as it was thought to be ineffective. We introduce a rigorous way to build multiple hash tables on binary code substrings that enables exact k-nearest neighbor search in Hamming space. The approach is storage efficient and straightforward to implement. Theoretical analysis shows that the algorithm exhibits sub-linear run-time behavior for uniformly distributed codes. Empirical results show dramatic speedups over a linear scan baseline for datasets of up to one billion codes of 64, 128, or 256 bits.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes