CV AIMar 22, 2024

Multimodal Fusion with Pre-Trained Model Features in Affective Behaviour Analysis In-the-wild

Zhuofan Wen, Fengyu Zhang, Siyuan Zhang, Haiyang Sun, Mingyu Xu, Licai Sun, Zheng Lian, Bin Liu, Jianhua Tao

arXiv:2403.15044v16.53 citationsh-index: 17Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses emotion analysis in uncontrolled environments, but it is incremental as it applies existing techniques to a specific domain.

The paper tackles expression recognition and valence-arousal estimation in affective behavior analysis by combining multimodal fusion methods with pre-trained model features, achieving competitive performance on the Aff-Wild2 database.

Multimodal fusion is a significant method for most multimodal tasks. With the recent surge in the number of large pre-trained models, combining both multimodal fusion methods and pre-trained model features can achieve outstanding performance in many multimodal tasks. In this paper, we present our approach, which leverages both advantages for addressing the task of Expression (Expr) Recognition and Valence-Arousal (VA) Estimation. We evaluate the Aff-Wild2 database using pre-trained models, then extract the final hidden layers of the models as features. Following preprocessing and interpolation or convolution to align the extracted features, different models are employed for modal fusion. Our code is available at GitHub - FulgenceWen/ABAW6th.

View on arXiv PDF

Similar