CV HC LG MLOct 15, 2019

Face Behavior a la carte: Expressions, Affect and Action Units in a Single Network

Dimitrios Kollias, Viktoriia Sharmanska, Stefanos Zafeiriou

arXiv:1910.11111v327.0249 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of fragmented datasets and independent models in facial behavior analysis for computer vision and psychology applications, though it is incremental in combining existing tasks.

The authors tackled the problem of joint learning for multiple facial behavior analysis tasks (expressions, emotions, action units) by training a single multi-task network on all publicly available datasets (5M images), achieving consistently better performance than single-task networks and enabling zero- and few-shot learning for new tasks.

Automatic facial behavior analysis has a long history of studies in the intersection of computer vision, physiology and psychology. However it is only recently, with the collection of large-scale datasets and powerful machine learning methods such as deep neural networks, that automatic facial behavior analysis started to thrive. Three of its iconic tasks are automatic recognition of basic expressions (e.g. happy, sad, surprised), estimation of continuous emotions (e.g., valence and arousal), and detection of facial action units (activations of e.g. upper/inner eyebrows, nose wrinkles). Up until now these tasks have been mostly studied independently collecting a dataset for the task. We present the first and the largest study of all facial behaviour tasks learned jointly in a single multi-task, multi-domain and multi-label network, which we call FaceBehaviorNet. For this we utilize all publicly available datasets in the community (around 5M images) that study facial behaviour tasks in-the-wild. We demonstrate that training jointly an end-to-end network for all tasks has consistently better performance than training each of the single-task networks. Furthermore, we propose two simple strategies for coupling the tasks during training, co-annotation and distribution matching, and show the advantages of this approach. Finally we show that FaceBehaviorNet has learned features that encapsulate all aspects of facial behaviour, and can be successfully applied to perform tasks (compound emotion recognition) beyond the ones that it has been trained in a zero- and few-shot learning setting.

View on arXiv PDF

Similar