CLSDASMar 26, 2025

GatedxLSTM: A Multimodal Affective Computing Approach for Emotion Recognition in Conversations

arXiv:2503.20919v19 citationsh-index: 5Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of dynamic emotion recognition in conversations for affective computing applications, offering an incremental improvement with enhanced interpretability.

The paper tackles the problem of emotion recognition in conversations by proposing GatedxLSTM, a multimodal model that integrates speech and text from speakers and partners to identify key sentences driving emotional shifts, achieving state-of-the-art performance on the IEMOCAP dataset for four-class emotion classification.

Affective Computing (AC) is essential for advancing Artificial General Intelligence (AGI), with emotion recognition serving as a key component. However, human emotions are inherently dynamic, influenced not only by an individual's expressions but also by interactions with others, and single-modality approaches often fail to capture their full dynamics. Multimodal Emotion Recognition (MER) leverages multiple signals but traditionally relies on utterance-level analysis, overlooking the dynamic nature of emotions in conversations. Emotion Recognition in Conversation (ERC) addresses this limitation, yet existing methods struggle to align multimodal features and explain why emotions evolve within dialogues. To bridge this gap, we propose GatedxLSTM, a novel speech-text multimodal ERC model that explicitly considers voice and transcripts of both the speaker and their conversational partner(s) to identify the most influential sentences driving emotional shifts. By integrating Contrastive Language-Audio Pretraining (CLAP) for improved cross-modal alignment and employing a gating mechanism to emphasise emotionally impactful utterances, GatedxLSTM enhances both interpretability and performance. Additionally, the Dialogical Emotion Decoder (DED) refines emotion predictions by modelling contextual dependencies. Experiments on the IEMOCAP dataset demonstrate that GatedxLSTM achieves state-of-the-art (SOTA) performance among open-source methods in four-class emotion classification. These results validate its effectiveness for ERC applications and provide an interpretability analysis from a psychological perspective.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes