CVAIMar 22, 2024

Multimodal Fusion with Pre-Trained Model Features in Affective Behaviour Analysis In-the-wild

arXiv:2403.15044v13 citationsh-index: 17Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses emotion analysis in uncontrolled environments, but it is incremental as it applies existing techniques to a specific domain.

The paper tackles expression recognition and valence-arousal estimation in affective behavior analysis by combining multimodal fusion methods with pre-trained model features, achieving competitive performance on the Aff-Wild2 database.

Multimodal fusion is a significant method for most multimodal tasks. With the recent surge in the number of large pre-trained models, combining both multimodal fusion methods and pre-trained model features can achieve outstanding performance in many multimodal tasks. In this paper, we present our approach, which leverages both advantages for addressing the task of Expression (Expr) Recognition and Valence-Arousal (VA) Estimation. We evaluate the Aff-Wild2 database using pre-trained models, then extract the final hidden layers of the models as features. Following preprocessing and interpolation or convolution to align the extracted features, different models are employed for modal fusion. Our code is available at GitHub - FulgenceWen/ABAW6th.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes