AILGJun 1, 2025

GIA-MIC: Multimodal Emotion Recognition with Gated Interactive Attention and Modality-Invariant Learning Constraints

arXiv:2506.00865v12 citationsh-index: 55INTERSPEECH
Originality Incremental advance
AI Analysis

This work improves emotion recognition for human-computer interaction applications, representing an incremental advancement over existing attention-based fusion methods.

The paper tackled the problem of multimodal emotion recognition by addressing challenges in extracting modality-specific features and capturing cross-modal similarities, resulting in a method that achieved 80.7% WA and 81.3% UA on IEMOCAP, outperforming state-of-the-art approaches.

Multimodal emotion recognition (MER) extracts emotions from multimodal data, including visual, speech, and text inputs, playing a key role in human-computer interaction. Attention-based fusion methods dominate MER research, achieving strong classification performance. However, two key challenges remain: effectively extracting modality-specific features and capturing cross-modal similarities despite distribution differences caused by modality heterogeneity. To address these, we propose a gated interactive attention mechanism to adaptively extract modality-specific features while enhancing emotional information through pairwise interactions. Additionally, we introduce a modality-invariant generator to learn modality-invariant representations and constrain domain shifts by aligning cross-modal similarities. Experiments on IEMOCAP demonstrate that our method outperforms state-of-the-art MER approaches, achieving WA 80.7% and UA 81.3%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes