CVLGMay 31, 2016

Generalized Multi-view Embedding for Visual Recognition and Cross-modal Retrieval

arXiv:1605.09696v391 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of integrating multiple visual views and modalities for improved recognition and retrieval, representing an incremental advancement in multi-view learning methods.

The paper tackles the problem of multi-view embedding from different visual cues and modalities by proposing a unified solution for subspace learning methods using the Rayleigh quotient, which is extensible for multiple views, supervised learning, and non-linear embeddings, and demonstrates superior results in visual object recognition and cross-modal image retrieval compared to related methods.

In this paper, the problem of multi-view embedding from different visual cues and modalities is considered. We propose a unified solution for subspace learning methods using the Rayleigh quotient, which is extensible for multiple views, supervised learning, and non-linear embeddings. Numerous methods including Canonical Correlation Analysis, Partial Least Sqaure regression and Linear Discriminant Analysis are studied using specific intrinsic and penalty graphs within the same framework. Non-linear extensions based on kernels and (deep) neural networks are derived, achieving better performance than the linear ones. Moreover, a novel Multi-view Modular Discriminant Analysis (MvMDA) is proposed by taking the view difference into consideration. We demonstrate the effectiveness of the proposed multi-view embedding methods on visual object recognition and cross-modal image retrieval, and obtain superior results in both applications compared to related methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes