LGAIMMJul 9, 2025

HeLo: Heterogeneous Multi-Modal Fusion with Label Correlation for Emotion Distribution Learning

arXiv:2507.06821v35 citationsh-index: 6MM
Originality Incremental advance
AI Analysis

This work addresses the challenge of accurately recognizing mixed emotions in human-computer interaction, representing an incremental improvement over existing methods.

The paper tackles the problem of multi-modal emotion distribution learning by proposing HeLo, a framework that fuses heterogeneous physiological and behavioral data while exploiting label correlations among basic emotions, achieving superior performance on two public datasets.

Multi-modal emotion recognition has garnered increasing attention as it plays a significant role in human-computer interaction (HCI) in recent years. Since different discrete emotions may exist at the same time, compared with single-class emotion recognition, emotion distribution learning (EDL) that identifies a mixture of basic emotions has gradually emerged as a trend. However, existing EDL methods face challenges in mining the heterogeneity among multiple modalities. Besides, rich semantic correlations across arbitrary basic emotions are not fully exploited. In this paper, we propose a multi-modal emotion distribution learning framework, named HeLo, aimed at fully exploring the heterogeneity and complementary information in multi-modal emotional data and label correlation within mixed basic emotions. Specifically, we first adopt cross-attention to effectively fuse the physiological data. Then, an optimal transport (OT)-based heterogeneity mining module is devised to mine the interaction and heterogeneity between the physiological and behavioral representations. To facilitate label correlation learning, we introduce a learnable label embedding optimized by correlation matrix alignment. Finally, the learnable label embeddings and label correlation matrices are integrated with the multi-modal representations through a novel label correlation-driven cross-attention mechanism for accurate emotion distribution learning. Experimental results on two publicly available datasets demonstrate the superiority of our proposed method in emotion distribution learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes