Facial Affect Recognition based on Multi Architecture Encoder and Feature Fusion for the ABAW7 Challenge
This work addresses facial affect recognition for applications in human-computer interaction, but it is incremental as it builds on existing models with minor modifications.
The paper tackled facial affect recognition by using a multi-architecture encoder and feature fusion for the ABAW7 challenge, achieving results that significantly outperformed baselines across valence-arousal estimation, expression classification, and action unit detection sub-challenges.
In this paper, we present our approach to addressing the challenges of the 7th ABAW competition. The competition comprises three sub-challenges: Valence Arousal (VA) estimation, Expression (Expr) classification, and Action Unit (AU) detection. To tackle these challenges, we employ state-of-the-art models to extract powerful visual features. Subsequently, a Transformer Encoder is utilized to integrate these features for the VA, Expr, and AU sub-challenges. To mitigate the impact of varying feature dimensions, we introduce an affine module to align the features to a common dimension. Overall, our results significantly outperform the baselines.