An Ensemble Approach for Multiple Emotion Descriptors Estimation Using Multi-task Learning
This is an incremental improvement for emotion recognition in-the-wild, addressing the ABAW competition's multi-task challenge.
The paper tackles multi-task emotion analysis by using full face and context information with InceptionNet V3, attention mechanisms, transformers, and MLPs to predict arousal, valence, emotional expression, and action units, achieving a performance score of 0.917 on the validation dataset.
This paper illustrates our submission method to the fourth Affective Behavior Analysis in-the-Wild (ABAW) Competition. The method is used for the Multi-Task Learning Challenge. Instead of using only face information, we employ full information from a provided dataset containing face and the context around the face. We utilized the InceptionNet V3 model to extract deep features then we applied the attention mechanism to refine the features. After that, we put those features into the transformer block and multi-layer perceptron networks to get the final multiple kinds of emotion. Our model predicts arousal and valence, classifies the emotional expression and estimates the action units simultaneously. The proposed system achieves the performance of 0.917 on the MTL Challenge validation dataset.