LGAICLMar 31, 2023

Self-Supervised Multimodal Learning: A Survey

arXiv:2304.01008v3121 citationsh-index: 77Has Code
Originality Synthesis-oriented
AI Analysis

It addresses the annotation bottleneck in multimodal learning for researchers and practitioners, but is incremental as it surveys existing work rather than introducing new methods.

This survey reviews self-supervised multimodal learning (SSML), which tackles the problem of learning from raw multimodal data without expensive human annotations, summarizing state-of-the-art methods, challenges, and applications in fields like healthcare and remote sensing.

Multimodal learning, which aims to understand and analyze information from multiple modalities, has achieved substantial progress in the supervised regime in recent years. However, the heavy dependence on data paired with expensive human annotations impedes scaling up models. Meanwhile, given the availability of large-scale unannotated data in the wild, self-supervised learning has become an attractive strategy to alleviate the annotation bottleneck. Building on these two directions, self-supervised multimodal learning (SSML) provides ways to learn from raw multimodal data. In this survey, we provide a comprehensive review of the state-of-the-art in SSML, in which we elucidate three major challenges intrinsic to self-supervised learning with multimodal data: (1) learning representations from multimodal data without labels, (2) fusion of different modalities, and (3) learning with unaligned data. We then detail existing solutions to these challenges. Specifically, we consider (1) objectives for learning from multimodal unlabeled data via self-supervision, (2) model architectures from the perspective of different multimodal fusion strategies, and (3) pair-free learning strategies for coarse-grained and fine-grained alignment. We also review real-world applications of SSML algorithms in diverse fields such as healthcare, remote sensing, and machine translation. Finally, we discuss challenges and future directions for SSML. A collection of related resources can be found at: https://github.com/ys-zong/awesome-self-supervised-multimodal-learning.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes