IRFeb 2, 2019

Joint Cluster Unary Loss for Efficient Cross-Modal Hashing

arXiv:1902.00644v11.76 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of efficient cross-modal retrieval for large-scale multimodal data, offering an incremental improvement over existing hashing methods by reducing training complexity.

The paper tackles the inefficiency of training cross-modal hashing methods due to high computational complexity from pairwise or triplet losses, proposing a novel unary loss with O(n) complexity and a joint cluster hashing algorithm. Experiments on large-scale datasets show the method is superior or comparable to state-of-the-art methods in performance and more efficient in training.

With the rapid growth of various types of multimodal data, cross-modal deep hashing has received broad attention for solving cross-modal retrieval problems efficiently. Most cross-modal hashing methods follow the traditional supervised hashing framework in which the $O(n^2)$ data pairs and $O(n^3)$ data triplets are generated for training, but the training procedure is less efficient because the complexity is high for large-scale dataset. To address these issues, we propose a novel and efficient cross-modal hashing algorithm in which the unary loss is introduced. First of all, We introduce the Cross-Modal Unary Loss (CMUL) with $O(n)$ complexity to bridge the traditional triplet loss and classification-based unary loss. A more accurate bound of the triplet loss for structured multilabel data is also proposed in CMUL. Second, we propose the novel Joint Cluster Cross-Modal Hashing (JCCH) algorithm for efficient hash learning, in which the CMUL is involved. The resultant hashcodes form several clusters in which the hashcodes in the same cluster share similar semantic information, and the heterogeneity gap on different modalities is diminished by sharing the clusters. The proposed algorithm is able to be applied to various types of data, and experiments on large-scale datasets show that the proposed method is superior over or comparable with state-of-the-art cross-modal hashing methods, and training with the proposed method is more efficient than others.

View on arXiv PDF

Similar