Frame-level Prediction of Facial Expressions, Valence, Arousal and Action Units for Mobile Devices
This work addresses the problem of efficient emotion recognition for mobile applications, but it is incremental as it builds on existing models and datasets.
The paper tackled real-time facial emotion analytics on mobile devices by proposing a frame-level algorithm using a pre-trained EfficientNet model, achieving 0.15-0.2 higher performance measures compared to a VggFace baseline on the Aff-Wild2 database.
In this paper, we consider the problem of real-time video-based facial emotion analytics, namely, facial expression recognition, prediction of valence and arousal and detection of action unit points. We propose the novel frame-level emotion recognition algorithm by extracting facial features with the single EfficientNet model pre-trained on AffectNet. As a result, our approach may be implemented even for video analytics on mobile devices. Experimental results for the large scale Aff-Wild2 database from the third Affective Behavior Analysis in-the-wild (ABAW) Competition demonstrate that our simple model is significantly better when compared to the VggFace baseline. In particular, our method is characterized by 0.15-0.2 higher performance measures for validation sets in uni-task Expression Classification, Valence-Arousal Estimation and Expression Classification. Due to simplicity, our approach may be considered as a new baseline for all four sub-challenges.