CVJul 7, 2021

Learning Vision Transformer with Squeeze and Excitation for Facial Expression Recognition

arXiv:2107.03107v477 citations
Originality Synthesis-oriented
AI Analysis

This work addresses data scarcity in FER, a domain-specific problem for emotion analysis, with an incremental improvement over existing methods.

The paper tackled the challenge of limited data in Facial Expression Recognition (FER) by proposing a vision Transformer combined with a Squeeze and Excitation block, achieving state-of-the-art performance on CK+ and SFEW databases and competitive results on JAFFE and RAF-DB.

As various databases of facial expressions have been made accessible over the last few decades, the Facial Expression Recognition (FER) task has gotten a lot of interest. The multiple sources of the available databases raised several challenges for facial recognition task. These challenges are usually addressed by Convolution Neural Network (CNN) architectures. Different from CNN models, a Transformer model based on attention mechanism has been presented recently to address vision tasks. One of the major issue with Transformers is the need of a large data for training, while most FER databases are limited compared to other vision applications. Therefore, we propose in this paper to learn a vision Transformer jointly with a Squeeze and Excitation (SE) block for FER task. The proposed method is evaluated on different publicly available FER databases including CK+, JAFFE,RAF-DB and SFEW. Experiments demonstrate that our model outperforms state-of-the-art methods on CK+ and SFEW and achieves competitive results on JAFFE and RAF-DB.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes