Deep Cross-Modal Hashing
This addresses the need for efficient multimedia retrieval by improving cross-modal hashing, though it is incremental as it builds on existing methods with deep learning integration.
The paper tackles the problem of suboptimal performance in cross-modal hashing due to reliance on hand-crafted features by proposing DCMH, an end-to-end deep learning framework that integrates feature and hash-code learning, achieving state-of-the-art performance on text-image datasets.
Due to its low storage cost and fast query speed, cross-modal hashing (CMH) has been widely used for similarity search in multimedia retrieval applications. However, almost all existing CMH methods are based on hand-crafted features which might not be optimally compatible with the hash-code learning procedure. As a result, existing CMH methods with handcrafted features may not achieve satisfactory performance. In this paper, we propose a novel cross-modal hashing method, called deep crossmodal hashing (DCMH), by integrating feature learning and hash-code learning into the same framework. DCMH is an end-to-end learning framework with deep neural networks, one for each modality, to perform feature learning from scratch. Experiments on two real datasets with text-image modalities show that DCMH can outperform other baselines to achieve the state-of-the-art performance in cross-modal retrieval applications.