MMMar 10, 2018

Deep Cross-media Knowledge Transfer

arXiv:1803.03777v147 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of data labeling in cross-media retrieval for multimedia applications, but it is incremental as it builds on existing transfer learning methods.

The paper tackles the problem of cross-media retrieval with limited labeled data by proposing a deep cross-media knowledge transfer (DCKT) approach that transfers knowledge from a large-scale dataset to a small-scale one, achieving promising improvements in retrieval accuracy.

Cross-media retrieval is a research hotspot in multimedia area, which aims to perform retrieval across different media types such as image and text. The performance of existing methods usually relies on labeled data for model training. However, cross-media data is very labor consuming to collect and label, so how to transfer valuable knowledge in existing data to new data is a key problem towards application. For achieving the goal, this paper proposes deep cross-media knowledge transfer (DCKT) approach, which transfers knowledge from a large-scale cross-media dataset to promote the model training on another small-scale cross-media dataset. The main contributions of DCKT are: (1) Two-level transfer architecture is proposed to jointly minimize the media-level and correlation-level domain discrepancies, which allows two important and complementary aspects of knowledge to be transferred: intra-media semantic and inter-media correlation knowledge. It can enrich the training information and boost the retrieval accuracy. (2) Progressive transfer mechanism is proposed to iteratively select training samples with ascending transfer difficulties, via the metric of cross-media domain consistency with adaptive feedback. It can drive the transfer process to gradually reduce vast cross-media domain discrepancy, so as to enhance the robustness of model training. For verifying the effectiveness of DCKT, we take the largescale dataset XMediaNet as source domain, and 3 widelyused datasets as target domain for cross-media retrieval. Experimental results show that DCKT achieves promising improvement on retrieval accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes