LGMay 20

Leveraging Self-Paced Curriculum Learning for Enhanced Modality Balance in Multimodal Conversational Emotion Recognition

arXiv:2605.2156557.3
Predicted impact top 40% in LG · last 90 daysOriginality Incremental advance
AI Analysis

For researchers in multimodal emotion recognition, this work offers a lightweight, generalizable method to improve model robustness and performance, though it is incremental as it applies existing curriculum learning to a known bottleneck.

The paper tackles modality misalignment and imbalanced learning in Multimodal Emotion Recognition in Conversations (MERC) by proposing a plug-and-play Self-Paced Curriculum Learning (SPCL) framework. On IEMOCAP, SPCL improves weighted F1-score by +1.2% to +6.6%, and on MELD, gains reach up to +10.4%.

Multimodal Emotion Recognition in Conversations (MERC) is a crucial task for understanding human interactions, where multimodal approaches integrating language, facial expressions, and vocal tone have achieved significant progress. However, modality misalignment and imbalanced learning remain major challenges, limiting the effective utilization of multimodal information. To address this issue, we propose a plug-and-play framework based on Self-Paced Curriculum Learning (SPCL) for MERC. We introduce a dual-level Difficulty Measurer that captures both utterance-level and conversation-level challenges. The utterance-level score models fine-grained modality-specific difficulty, while the conversation-level score captures broader dialogue structures, including emotional dependencies and modality coherence. Based on these scores, the Learning Scheduler dynamically guides training from easier to more difficult instances. By integrating SPCL into existing MERC architectures, our method alleviates modality imbalance and improves model robustness. Extensive experiments on the IEMOCAP and MELD datasets demonstrate consistent improvements across different architectures and modality settings. On IEMOCAP, SPCL improves weighted F1-score by approximately +1.2% to +6.6% over baseline models, while on MELD, gains reach up to +10.4%. These results highlight the effectiveness and generalizability of SPCL as a lightweight plug-and-play module for multimodal emotion recognition.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes