CVJan 26, 2023

Facial Expression Recognition using Squeeze and Excitation-powered Swin Transformers

AppleStanford

arXiv:2301.10906v71.54 citationsh-index: 14

Originality Incremental advance

AI Analysis

This work addresses the problem of accurate and efficient facial emotion recognition for applications in human-computer interaction and mental health, but it is incremental as it builds on existing transformer and SE methods.

The paper tackled facial emotion recognition by proposing a framework using Swin Transformers and squeeze-and-excitation blocks to improve efficiency with minimal data, achieving an F1-score of 0.5420 on the AffectNet dataset, which surpassed the winner of a 2022 competition.

The ability to recognize and interpret facial emotions is a critical component of human communication, as it allows individuals to understand and respond to emotions conveyed through facial expressions and vocal tones. The recognition of facial emotions is a complex cognitive process that involves the integration of visual and auditory information, as well as prior knowledge and social cues. It plays a crucial role in social interaction, affective processing, and empathy, and is an important aspect of many real-world applications, including human-computer interaction, virtual assistants, and mental health diagnosis and treatment. The development of accurate and efficient models for facial emotion recognition is therefore of great importance and has the potential to have a significant impact on various fields of study.The field of Facial Emotion Recognition (FER) is of great significance in the areas of computer vision and artificial intelligence, with vast commercial and academic potential in fields such as security, advertising, and entertainment. We propose a FER framework that employs Swin Vision Transformers (SwinT) and squeeze and excitation block (SE) to address vision tasks. The approach uses a transformer model with an attention mechanism, SE, and SAM to improve the efficiency of the model, as transformers often require a large amount of data. Our focus was to create an efficient FER model based on SwinT architecture that can recognize facial emotions using minimal data. We trained our model on a hybrid dataset and evaluated its performance on the AffectNet dataset, achieving an F1-score of 0.5420, which surpassed the winner of the Affective Behavior Analysis in the Wild (ABAW) Competition held at the European Conference on Computer Vision (ECCV) 2022~\cite{Kollias}.

View on arXiv PDF

Similar