DynCIM: Dynamic Curriculum for Imbalanced Multimodal Learning
This addresses modality and sample imbalances in multimodal learning, enhancing adaptability and robustness, though it is incremental as it builds on existing curriculum learning and fusion techniques.
The paper tackles the problem of imbalanced multimodal learning by introducing DynCIM, a dynamic curriculum learning framework that quantifies imbalances from sample and modality perspectives, resulting in consistent outperformance of state-of-the-art methods on six multimodal benchmarking datasets.
Multimodal learning integrates complementary information from diverse modalities to enhance the decision-making process. However, the potential of multimodal collaboration remains under-exploited due to disparities in data quality and modality representation capabilities. To address this, we introduce DynCIM, a novel dynamic curriculum learning framework designed to quantify the inherent imbalances from both sample and modality perspectives. DynCIM employs a sample-level curriculum to dynamically assess each sample's difficulty according to prediction deviation, consistency, and stability, while a modality-level curriculum measures modality contributions from global and local. Furthermore, a gating-based dynamic fusion mechanism is introduced to adaptively adjust modality contributions, minimizing redundancy and optimizing fusion effectiveness. Extensive experiments on six multimodal benchmarking datasets, spanning both bimodal and trimodal scenarios, demonstrate that DynCIM consistently outperforms state-of-the-art methods. Our approach effectively mitigates modality and sample imbalances while enhancing adaptability and robustness in multimodal learning tasks. Our code is available at https://github.com/Raymond-Qiancx/DynCIM.