CVMMJul 28, 2024

Detached and Interactive Multimodal Learning

arXiv:2407.19514v114 citationsh-index: 22Has Code
Originality Incremental advance
AI Analysis

This addresses a bottleneck in multimodal learning for researchers and practitioners by mitigating modality competition, though it appears incremental as it builds on existing frameworks with novel components.

The paper tackles the modality competition problem in multimodal learning by proposing DI-MML, a detached framework that trains modality encoders separately and uses cross-modal interaction techniques, achieving superior performance on audio-visual, flow-image, and front-rear view datasets.

Recently, Multimodal Learning (MML) has gained significant interest as it compensates for single-modality limitations through comprehensive complementary information within multimodal data. However, traditional MML methods generally use the joint learning framework with a uniform learning objective that can lead to the modality competition issue, where feedback predominantly comes from certain modalities, limiting the full potential of others. In response to this challenge, this paper introduces DI-MML, a novel detached MML framework designed to learn complementary information across modalities under the premise of avoiding modality competition. Specifically, DI-MML addresses competition by separately training each modality encoder with isolated learning objectives. It further encourages cross-modal interaction via a shared classifier that defines a common feature space and employing a dimension-decoupled unidirectional contrastive (DUC) loss to facilitate modality-level knowledge transfer. Additionally, to account for varying reliability in sample pairs, we devise a certainty-aware logit weighting strategy to effectively leverage complementary information at the instance level during inference. Extensive experiments conducted on audio-visual, flow-image, and front-rear view datasets show the superior performance of our proposed method. The code is released at https://github.com/fanyunfeng-bit/DI-MML.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes