CVCLMay 16, 2023

UniS-MMC: Multimodal Classification via Unimodality-supervised Multimodal Contrastive Learning

arXiv:2305.09299v1224 citations
Originality Incremental advance
AI Analysis

This addresses multimodal classification for tasks like image-text analysis, but appears incremental as it builds on existing contrastive learning approaches.

The paper tackles the problem of multimodal learning by proposing a contrastive method that uses unimodal predictions as weak supervision to align representations, achieving state-of-the-art performance on image-text classification benchmarks UPMC-Food-101 and N24News.

Multimodal learning aims to imitate human beings to acquire complementary information from multiple modalities for various downstream tasks. However, traditional aggregation-based multimodal fusion methods ignore the inter-modality relationship, treat each modality equally, suffer sensor noise, and thus reduce multimodal learning performance. In this work, we propose a novel multimodal contrastive method to explore more reliable multimodal representations under the weak supervision of unimodal predicting. Specifically, we first capture task-related unimodal representations and the unimodal predictions from the introduced unimodal predicting task. Then the unimodal representations are aligned with the more effective one by the designed multimodal contrastive method under the supervision of the unimodal predictions. Experimental results with fused features on two image-text classification benchmarks UPMC-Food-101 and N24News show that our proposed Unimodality-Supervised MultiModal Contrastive UniS-MMC learning method outperforms current state-of-the-art multimodal methods. The detailed ablation study and analysis further demonstrate the advantage of our proposed method.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes