CVMar 9, 2021

SMIL: Multimodal Learning with Severely Missing Modality

arXiv:2103.05677v1406 citationsHas Code
Originality Incremental advance
AI Analysis

It addresses a critical limitation in multimodal learning for applications where training data is often incomplete, though it is incremental as it builds on existing methods for missing modalities.

The paper tackles multimodal learning when training data has severely missing modalities (e.g., 90% incomplete), proposing SMIL, a method using Bayesian meta-learning that achieves state-of-the-art performance on benchmarks like MM-IMDb, CMU-MOSI, and avMNIST.

A common assumption in multimodal learning is the completeness of training data, i.e., full modalities are available in all training examples. Although there exists research endeavor in developing novel methods to tackle the incompleteness of testing data, e.g., modalities are partially missing in testing examples, few of them can handle incomplete training modalities. The problem becomes even more challenging if considering the case of severely missing, e.g., 90% training examples may have incomplete modalities. For the first time in the literature, this paper formally studies multimodal learning with missing modality in terms of flexibility (missing modalities in training, testing, or both) and efficiency (most training data have incomplete modality). Technically, we propose a new method named SMIL that leverages Bayesian meta-learning in uniformly achieving both objectives. To validate our idea, we conduct a series of experiments on three popular benchmarks: MM-IMDb, CMU-MOSI, and avMNIST. The results prove the state-of-the-art performance of SMIL over existing methods and generative baselines including autoencoders and generative adversarial networks. Our code is available at https://github.com/mengmenm/SMIL.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes