Emotic Masked Autoencoder with Attention Fusion for Facial Expression Recognition
This work addresses dataset limitations for FER applications in computer vision, but it appears incremental as it builds on existing methods like MAE-Face and attention mechanisms.
The paper tackles the problem of limited datasets in Facial Expression Recognition (FER) by integrating a self-supervised learning method with a multi-view Fusion Attention mechanism, resulting in improved performance on the Aff-wild2 dataset.
Facial Expression Recognition (FER) is a critical task within computer vision with diverse applications across various domains. Addressing the challenge of limited FER datasets, which hampers the generalization capability of expression recognition models, is imperative for enhancing performance. Our paper presents an innovative approach integrating the MAE-Face self-supervised learning (SSL) method and multi-view Fusion Attention mechanism for expression classification, particularly showcased in the 6th Affective Behavior Analysis in-the-wild (ABAW) competition. By utilizing low-level feature information from the ipsilateral view (auxiliary view) before learning the high-level feature that emphasizes the shift in the human facial expression, our work seeks to provide a straightforward yet innovative way to improve the examined view (main view). We also suggest easy-to-implement and no-training frameworks aimed at highlighting key facial features to determine if such features can serve as guides for the model, focusing on pivotal local elements. The efficacy of this method is validated by improvements in model performance on the Aff-wild2 dataset, as observed in both training and validation contexts.