CV MMApr 13, 2023

Deep Metric Multi-View Hashing for Multimedia Retrieval

Jian Zhu, Zhangmin Huang, Xiaohu Ruan, Yu Cui, Yongli Cheng, Lingfang Zeng

arXiv:2304.06358v15.911 citationsh-index: 23

Originality Incremental advance

AI Analysis

This work improves retrieval precision for multimedia systems, though it appears incremental as it builds on existing multi-view hashing methods.

The paper tackles the problem of learning hash representations for multi-view heterogeneous data in multimedia retrieval by addressing ineffective feature fusion and underutilization of dissimilar sample metric information, resulting in up to 15.28 mAP improvement over state-of-the-art methods on datasets like MIR-Flickr25K.

Learning the hash representation of multi-view heterogeneous data is an important task in multimedia retrieval. However, existing methods fail to effectively fuse the multi-view features and utilize the metric information provided by the dissimilar samples, leading to limited retrieval precision. Current methods utilize weighted sum or concatenation to fuse the multi-view features. We argue that these fusion methods cannot capture the interaction among different views. Furthermore, these methods ignored the information provided by the dissimilar samples. We propose a novel deep metric multi-view hashing (DMMVH) method to address the mentioned problems. Extensive empirical evidence is presented to show that gate-based fusion is better than typical methods. We introduce deep metric learning to the multi-view hashing problems, which can utilize metric information of dissimilar samples. On the MIR-Flickr25K, MS COCO, and NUS-WIDE, our method outperforms the current state-of-the-art methods by a large margin (up to 15.28 mean Average Precision (mAP) improvement).

View on arXiv PDF

Similar