IRApr 25, 2019

Fusion-supervised Deep Cross-modal Hashing

Li Wang, Lei Zhu, En Yu, Jiande Sun, Huaxiang Zhang

arXiv:1904.11171v23.118 citations

Originality Incremental advance

AI Analysis

This work addresses cross-modal retrieval for applications like multimedia search, but it appears incremental as it builds on existing deep hashing methods with specific enhancements.

The paper tackles the problem of cross-modal retrieval by proposing a fusion-supervised deep hashing method that learns unified binary codes to capture heterogeneous multi-modal correlations and embed semantic information, achieving state-of-the-art performance on two benchmark datasets.

Deep hashing has recently received attention in cross-modal retrieval for its impressive advantages. However, existing hashing methods for cross-modal retrieval cannot fully capture the heterogeneous multi-modal correlation and exploit the semantic information. In this paper, we propose a novel \emph{Fusion-supervised Deep Cross-modal Hashing} (FDCH) approach. Firstly, FDCH learns unified binary codes through a fusion hash network with paired samples as input, which effectively enhances the modeling of the correlation of heterogeneous multi-modal data. Then, these high-quality unified hash codes further supervise the training of the modality-specific hash networks for encoding out-of-sample queries. Meanwhile, both pair-wise similarity information and classification information are embedded in the hash networks under one stream framework, which simultaneously preserves cross-modal similarity and keeps semantic consistency. Experimental results on two benchmark datasets demonstrate the state-of-the-art performance of FDCH.

View on arXiv PDF

Similar