Mitigating Modality Imbalance in Multi-modal Learning via Multi-objective Optimization
This addresses a key bottleneck in multi-modal learning for researchers and practitioners, offering a more efficient solution to balance modalities, though it is incremental as it builds on existing methods.
The paper tackles the problem of modality imbalance in multi-modal learning, which can cause underperformance compared to single-modality approaches, by reformulating it as a multi-objective optimization problem and proposing a gradient-based algorithm; the result includes up to ~20x reduction in computation time and improved performance over baselines.
Multi-modal learning (MML) aims to integrate information from multiple modalities, which is expected to lead to superior performance over single-modality learning. However, recent studies have shown that MML can underperform, even compared to single-modality approaches, due to imbalanced learning across modalities. Methods have been proposed to alleviate this imbalance issue using different heuristics, which often lead to computationally intensive subroutines. In this paper, we reformulate the MML problem as a multi-objective optimization (MOO) problem that overcomes the imbalanced learning issue among modalities and propose a gradient-based algorithm to solve the modified MML problem. We provide convergence guarantees for the proposed method, and empirical evaluations on popular MML benchmarks showcasing the improved performance of the proposed method over existing balanced MML and MOO baselines, with up to ~20x reduction in subroutine computation time. Our code is available at https://github.com/heshandevaka/MIMO.