CVMar 13, 2025

Mamba-VA: A Mamba-based Approach for Continuous Emotion Recognition in Valence-Arousal Space

Yuheng Liang, Zheyu Wang, Feng Liu, Mingzhou Liu, Yu Yao

arXiv:2503.10104v113.110 citationsh-index: 2Has Code2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Originality Incremental advance

AI Analysis

This work addresses emotion recognition for applications like human-computer interaction and mental health monitoring, but it is incremental as it adapts existing architectures to a specific domain.

The paper tackles continuous emotion recognition in valence-arousal space by proposing Mamba-VA, a model that combines MAE, TCN, and Mamba for temporal modeling, achieving valence and arousal scores of 0.5362 and 0.4310 on a validation set, outperforming baselines.

Continuous Emotion Recognition (CER) plays a crucial role in intelligent human-computer interaction, mental health monitoring, and autonomous driving. Emotion modeling based on the Valence-Arousal (VA) space enables a more nuanced representation of emotional states. However, existing methods still face challenges in handling long-term dependencies and capturing complex temporal dynamics. To address these issues, this paper proposes a novel emotion recognition model, Mamba-VA, which leverages the Mamba architecture to efficiently model sequential emotional variations in video frames. First, the model employs a Masked Autoencoder (MAE) to extract deep visual features from video frames, enhancing the robustness of temporal information. Then, a Temporal Convolutional Network (TCN) is utilized for temporal modeling to capture local temporal dependencies. Subsequently, Mamba is applied for long-sequence modeling, enabling the learning of global emotional trends. Finally, a fully connected (FC) layer performs regression analysis to predict continuous valence and arousal values. Experimental results on the Valence-Arousal (VA) Estimation task of the 8th competition on Affective Behavior Analysis in-the-wild (ABAW) demonstrate that the proposed model achieves valence and arousal scores of 0.5362 (0.5036) and 0.4310 (0.4119) on the validation (test) set, respectively, outperforming the baseline. The source code is available on GitHub:https://github.com/FreedomPuppy77/Charon.

View on arXiv PDF Code

Similar