LGSep 30, 2025

MIDAS: Misalignment-based Data Augmentation Strategy for Imbalanced Multimodal Learning

Seong-Hyeon Hwang, Soyoung Choi, Steven Euijong Whang

arXiv:2509.25831v11 citationsh-index: 28

Originality Highly original

AI Analysis

This addresses modality imbalance in multimodal learning, which is an incremental improvement over prior data-centric solutions.

The paper tackles the problem of multimodal models over-relying on dominant modalities by proposing MIDAS, a data augmentation strategy that generates misaligned samples with inconsistent cross-modal information, and it significantly outperforms baselines on multimodal classification benchmarks.

Multimodal models often over-rely on dominant modalities, failing to achieve optimal performance. While prior work focuses on modifying training objectives or optimization procedures, data-centric solutions remain underexplored. We propose MIDAS, a novel data augmentation strategy that generates misaligned samples with semantically inconsistent cross-modal information, labeled using unimodal confidence scores to compel learning from contradictory signals. However, this confidence-based labeling can still favor the more confident modality. To address this within our misaligned samples, we introduce weak-modality weighting, which dynamically increases the loss weight of the least confident modality, thereby helping the model fully utilize weaker modality. Furthermore, when misaligned features exhibit greater similarity to the aligned features, these misaligned samples pose a greater challenge, thereby enabling the model to better distinguish between classes. To leverage this, we propose hard-sample weighting, which prioritizes such semantically ambiguous misaligned samples. Experiments on multiple multimodal classification benchmarks demonstrate that MIDAS significantly outperforms related baselines in addressing modality imbalance.

View on arXiv PDF

Similar