Learning Hash Codes via Hamming Distance Targets
This work addresses efficient similarity search for information retrieval tasks, offering significant improvements in accuracy and speed, though it is incremental as it builds on prior hashing methods.
The authors tackled the problem of learning binary hash codes for similarity search by introducing a new loss function and training scheme, resulting in state-of-the-art performance with MAP improvements from 73% to 84% on ImageNet and reduced query costs by 2-8 times on SIFT 1M.
We present a powerful new loss function and training scheme for learning binary hash codes with any differentiable model and similarity function. Our loss function improves over prior methods by using log likelihood loss on top of an accurate approximation for the probability that two inputs fall within a Hamming distance target. Our novel training scheme obtains a good estimate of the true gradient by better sampling inputs and evaluating loss terms between all pairs of inputs in each minibatch. To fully leverage the resulting hashes, we use multi-indexing. We demonstrate that these techniques provide large improvements to a similarity search tasks. We report the best results to date on competitive information retrieval tasks for ImageNet and SIFT 1M, improving MAP from 73% to 84% and reducing query cost by a factor of 2-8, respectively.