CVMar 18, 2024

HSEmotion Team at the 6th ABAW Competition: Facial Expressions, Valence-Arousal and Emotion Intensity Prediction

arXiv:2403.11590v113.520 citationsh-index: 4

Originality Synthesis-oriented

AI Analysis

This work addresses the need for reliable facial emotion prediction in real-world applications, but it is incremental as it builds on existing architectures and multi-task learning approaches.

The authors tackled the problem of improving trustworthiness in facial emotion analysis by using pre-trained deep models to extract emotional features without fine-tuning, achieving significant improvements in quality metrics on validation sets for five tasks in the ABAW competition.

This article presents our results for the sixth Affective Behavior Analysis in-the-wild (ABAW) competition. To improve the trustworthiness of facial analysis, we study the possibility of using pre-trained deep models that extract reliable emotional features without the need to fine-tune the neural networks for a downstream task. In particular, we introduce several lightweight models based on MobileViT, MobileFaceNet, EfficientNet, and DDAMFN architectures trained in multi-task scenarios to recognize facial expressions, valence, and arousal on static photos. These neural networks extract frame-level features fed into a simple classifier, e.g., linear feed-forward neural network, to predict emotion intensity, compound expressions, action units, facial expressions, and valence/arousal. Experimental results for five tasks from the sixth ABAW challenge demonstrate that our approach lets us significantly improve quality metrics on validation sets compared to existing non-ensemble techniques.

View on arXiv PDF

Similar