V-NAW: Video-based Noise-aware Adaptive Weighting for Facial Expression Recognition
This work addresses performance degradation in facial expression recognition for applications like human-computer interaction, but it is incremental as it builds on known bottlenecks.
The paper tackled label ambiguity and class imbalance in video-based facial expression recognition, proposing V-NAW to adaptively weight frames and an augmentation strategy to reduce redundancy, resulting in significant performance improvements as validated through experiments.
Facial Expression Recognition (FER) plays a crucial role in human affective analysis and has been widely applied in computer vision tasks such as human-computer interaction and psychological assessment. The 8th Affective Behavior Analysis in-the-Wild (ABAW) Challenge aims to assess human emotions using the video-based Aff-Wild2 dataset. This challenge includes various tasks, including the video-based EXPR recognition track, which is our primary focus. In this paper, we demonstrate that addressing label ambiguity and class imbalance, which are known to cause performance degradation, can lead to meaningful performance improvements. Specifically, we propose Video-based Noise-aware Adaptive Weighting (V-NAW), which adaptively assigns importance to each frame in a clip to address label ambiguity and effectively capture temporal variations in facial expressions. Furthermore, we introduce a simple and effective augmentation strategy to reduce redundancy between consecutive frames, which is a primary cause of overfitting. Through extensive experiments, we validate the effectiveness of our approach, demonstrating significant improvements in video-based FER performance.