CVMar 21, 2025

Feature-Based Dual Visual Feature Extraction Model for Compound Multimodal Emotion Recognition

Ran Liu, Fengyu Zhang, Cong Yu, Longjiang Yang, Zhuofan Wen, Siyuan Zhang, Hailiang Yao, Shun Chen, Zheng Lian, Bin Liu

arXiv:2503.17453v12 citationsh-index: 11Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses uncertainty and modal conflicts in compound emotion recognition for affective computing and human-computer interaction, but it appears incremental as it combines existing models.

The paper tackles compound multimodal emotion recognition by proposing a method that fuses Vision Transformer and Residual Network features, achieving superior performance on the C-EXPR-DB dataset in complex scenarios.

This article presents our results for the eighth Affective Behavior Analysis in-the-wild (ABAW) competition.Multimodal emotion recognition (ER) has important applications in affective computing and human-computer interaction. However, in the real world, compound emotion recognition faces greater issues of uncertainty and modal conflicts. For the Compound Expression (CE) Recognition Challenge,this paper proposes a multimodal emotion recognition method that fuses the features of Vision Transformer (ViT) and Residual Network (ResNet). We conducted experiments on the C-EXPR-DB and MELD datasets. The results show that in scenarios with complex visual and audio cues (such as C-EXPR-DB), the model that fuses the features of ViT and ResNet exhibits superior performance.Our code are avalible on https://github.com/MyGitHub-ax/8th_ABAW

View on arXiv PDF Code

Similar