CVMar 25, 2022

Frame-level Prediction of Facial Expressions, Valence, Arousal and Action Units for Mobile Devices

arXiv:2203.13436v237 citationsh-index: 23
AI Analysis

This work addresses the problem of efficient emotion recognition for mobile applications, but it is incremental as it builds on existing models and datasets.

The paper tackled real-time facial emotion analytics on mobile devices by proposing a frame-level algorithm using a pre-trained EfficientNet model, achieving 0.15-0.2 higher performance measures compared to a VggFace baseline on the Aff-Wild2 database.

In this paper, we consider the problem of real-time video-based facial emotion analytics, namely, facial expression recognition, prediction of valence and arousal and detection of action unit points. We propose the novel frame-level emotion recognition algorithm by extracting facial features with the single EfficientNet model pre-trained on AffectNet. As a result, our approach may be implemented even for video analytics on mobile devices. Experimental results for the large scale Aff-Wild2 database from the third Affective Behavior Analysis in-the-wild (ABAW) Competition demonstrate that our simple model is significantly better when compared to the VggFace baseline. In particular, our method is characterized by 0.15-0.2 higher performance measures for validation sets in uni-task Expression Classification, Valence-Arousal Estimation and Expression Classification. Due to simplicity, our approach may be considered as a new baseline for all four sub-challenges.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes