FERAtt: Facial Expression Recognition with Attention Net
This work addresses facial expression recognition for computer vision applications, but it is incremental as it builds on existing datasets and methods.
The authors tackled facial expression recognition by proposing an end-to-end network with an attention model and Gaussian space representation, achieving superior results compared to a PreActResNet18 baseline on synthetic datasets derived from BU3DFE and CK+.
We present a new end-to-end network architecture for facial expression recognition with an attention model. It focuses attention in the human face and uses a Gaussian space representation for expression recognition. We devise this architecture based on two fundamental complementary components: (1) facial image correction and attention and (2) facial expression representation and classification. The first component uses an encoder-decoder style network and a convolutional feature extractor that are pixel-wise multiplied to obtain a feature attention map. The second component is responsible for obtaining an embedded representation and classification of the facial expression. We propose a loss function that creates a Gaussian structure on the representation space. To demonstrate the proposed method, we create two larger and more comprehensive synthetic datasets using the traditional BU3DFE and CK+ facial datasets. We compared results with the PreActResNet18 baseline. Our experiments on these datasets have shown the superiority of our approach in recognizing facial expressions.