Global Hashing System for Fast Image Search
This work addresses the need for efficient image search in large-scale applications, representing an incremental improvement over existing hashing methods.
The paper tackles the problem of fast approximate nearest neighbor search in large image datasets by proposing a two-step hashing method that first embeds data points in a low-dimensional space and then modifies a global positioning system approach for binary embedding, with experiments showing their data-dependent method outperforms others on datasets from 100k to 10M points.
Hashing methods have been widely investigated for fast approximate nearest neighbor searching in large data sets. Most existing methods use binary vectors in lower dimensional spaces to represent data points that are usually real vectors of higher dimensionality. We divide the hashing process into two steps. Data points are first embedded in a low-dimensional space, and the global positioning system method is subsequently introduced but modified for binary embedding. We devise dataindependent and data-dependent methods to distribute the satellites at appropriate locations. Our methods are based on finding the tradeoff between the information losses in these two steps. Experiments show that our data-dependent method outperforms other methods in different-sized data sets from 100k to 10M. By incorporating the orthogonality of the code matrix, both our data-independent and data-dependent methods are particularly impressive in experiments on longer bits.