MMMay 7

Modality-Aware Contrastive and Uncertainty-Regularized Emotion Recognition

Yan Zhuang, Minhao Liu, Yanru Zhang, Jiawen Deng, Fuji Ren

arXiv:2605.0624517.7

AI Analysis

For researchers in multimodal emotion recognition, this work provides a method to handle missing or inconsistent modalities, but the gains are incremental over existing approaches.

The paper tackles multimodal emotion recognition under heterogeneous modality combinations, proposing the MCUR framework that uses contrastive learning and uncertainty regularization. It achieves average F1 gains of 2.2% on MOSI, 2.67% on MOSEI, and 4.37% on IEMOCAP over existing methods.

Multimodal Emotion Recognition (MER) has attracted growing attention with the rapid advancement of human-computer interaction. However, different modalities exhibit substantial discrepancies in semantics, quality, and availability, leading to highly heterogeneous modality combinations and posing significant challenges to achieving consistent and reliable emotion understanding. To address this challenge, we propose the Modality-Aware Contrastive and Uncertainty-Regularized (MCUR) framework, which approaches MER from the perspective of representation consistency, aiming to enable robust emotion prediction across heterogeneous modality combinations. MCUR incorporates two core components: (1) Modality Combination-Based and Category-Based Contrastive Learning mechanism (MCB-CL), which encourages samples with the same emotion category and the same available modalities to be close in the representation space; and (2) Sample-wise Uncertainty-Guided Regularization (SUGR), which adaptively assigns sample-wise uncertain weights to samples to optimize training. Extensive experiments demonstrate that MCUR consistently outperforms existing methods, achieving average F1 gains of 2.2% on MOSI, 2.67% on MOSEI, and 4.37% on IEMOCAP.

View on arXiv PDF

Similar