CVNov 28, 2014

Cross-Modal Learning via Pairwise Constraints

Ran He, Man Zhang, Liang Wang, Ye Ji, Qiyue Yin

arXiv:1411.7798v166 citations

Originality Incremental advance

AI Analysis

This addresses cross-modal learning for multimedia applications, offering incremental improvements in accuracy.

The paper tackles the problem of learning common structures across text and image modalities using pairwise constraints, proposing unsupervised subspace clustering and supervised matching methods that reduce the semantic gap and improve clustering/retrieval accuracy.

In multimedia applications, the text and image components in a web document form a pairwise constraint that potentially indicates the same semantic concept. This paper studies cross-modal learning via the pairwise constraint, and aims to find the common structure hidden in different modalities. We first propose a compound regularization framework to deal with the pairwise constraint, which can be used as a general platform for developing cross-modal algorithms. For unsupervised learning, we propose a cross-modal subspace clustering method to learn a common structure for different modalities. For supervised learning, to reduce the semantic gap and the outliers in pairwise constraints, we propose a cross-modal matching method based on compound ?21 regularization along with an iteratively reweighted algorithm to find the global optimum. Extensive experiments demonstrate the benefits of joint text and image modeling with semantically induced pairwise constraints, and show that the proposed cross-modal methods can further reduce the semantic gap between different modalities and improve the clustering/retrieval accuracy.

View on arXiv PDF

Similar