Multi-modal Facial Action Unit Detection with Large Pre-trained Models for the 5th Competition on Affective Behavior Analysis in-the-wild
This work addresses facial expression analysis for affective computing applications, but it is incremental as it builds on existing pre-trained models and competition frameworks.
The paper tackled facial action unit detection by proposing a multi-modal method using visual, acoustic, and lexical features from large pre-trained models, achieving an F1 score of 52.3% on the official validation set of the ABAW 2023 Competition.
Facial action unit detection has emerged as an important task within facial expression analysis, aimed at detecting specific pre-defined, objective facial expressions, such as lip tightening and cheek raising. This paper presents our submission to the Affective Behavior Analysis in-the-wild (ABAW) 2023 Competition for AU detection. We propose a multi-modal method for facial action unit detection with visual, acoustic, and lexical features extracted from the large pre-trained models. To provide high-quality details for visual feature extraction, we apply super-resolution and face alignment to the training data and show potential performance gain. Our approach achieves the F1 score of 52.3% on the official validation set of the 5th ABAW Challenge.