CLAILGOct 23, 2022

MM-Align: Learning Optimal Transport-based Alignment Dynamics for Fast and Accurate Inference on Missing Modality Sequences

arXiv:2210.12798v1295 citationsh-index: 77
Originality Incremental advance
AI Analysis

It addresses the underexplored issue of randomly missing modalities in multimodal inference, which is incremental as it builds on existing methods by focusing on alignment dynamics rather than direct reconstruction.

The paper tackles the problem of missing modality sequences in multimodal tasks by proposing MM-Align, which uses optimal transport-based alignment dynamics for indirect imputation and a denoising training algorithm, resulting in more accurate and faster inference with reduced overfitting across three datasets.

Existing multimodal tasks mostly target at the complete input modality setting, i.e., each modality is either complete or completely missing in both training and test sets. However, the randomly missing situations have still been underexplored. In this paper, we present a novel approach named MM-Align to address the missing-modality inference problem. Concretely, we propose 1) an alignment dynamics learning module based on the theory of optimal transport (OT) for indirect missing data imputation; 2) a denoising training algorithm to simultaneously enhance the imputation results and backbone network performance. Compared with previous methods which devote to reconstructing the missing inputs, MM-Align learns to capture and imitate the alignment dynamics between modality sequences. Results of comprehensive experiments on three datasets covering two multimodal tasks empirically demonstrate that our method can perform more accurate and faster inference and relieve overfitting under various missing conditions.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes