CVDec 2, 2014

Hashing on Nonlinear Manifolds

Fumin Shen, Chunhua Shen, Qinfeng Shi, Anton van den Hengel, Zhenmin Tang, Heng Tao Shen

arXiv:1412.0826v1157 citations

Originality Incremental advance

AI Analysis

This addresses the problem of scalable similarity search and retrieval for applications like image classification, though it appears incremental as it builds on existing manifold learning and hashing techniques.

The paper tackles the problem of learning compact binary embeddings that preserve the intrinsic manifold structure of high-dimensional data, which previous manifold learning methods struggled with due to complexity and out-of-sample issues. The proposed approach enables hashing based on manifold learning like t-SNE, outperforming state-of-the-art methods on large-scale benchmarks and improving image classification with very short code lengths.

Learning based hashing methods have attracted considerable attention due to their ability to greatly increase the scale at which existing algorithms may operate. Most of these methods are designed to generate binary codes preserving the Euclidean similarity in the original space. Manifold learning techniques, in contrast, are better able to model the intrinsic structure embedded in the original high-dimensional data. The complexities of these models, and the problems with out-of-sample data, have previously rendered them unsuitable for application to large-scale embedding, however. In this work, how to learn compact binary embeddings on their intrinsic manifolds is considered. In order to address the above-mentioned difficulties, an efficient, inductive solution to the out-of-sample data problem, and a process by which non-parametric manifold learning may be used as the basis of a hashing method is proposed. The proposed approach thus allows the development of a range of new hashing techniques exploiting the flexibility of the wide variety of manifold learning approaches available. It is particularly shown that hashing on the basis of t-SNE outperforms state-of-the-art hashing methods on large-scale benchmark datasets, and is very effective for image classification with very short code lengths. The proposed hashing framework is shown to be easily improved, for example, by minimizing the quantization error with learned orthogonal rotations. In addition, a supervised inductive manifold hashing framework is developed by incorporating the label information, which is shown to greatly advance the semantic retrieval performance.

View on arXiv PDF

Similar