Learning Grimaces by Watching TV
This addresses the challenge of unsupervised facial expression learning for machine vision systems, though it is incremental as it builds on existing deep learning methods.
The paper tackles the problem of learning facial expressions from videos by relating them to objectively measurable events, such as in a gameshow, and develops state-of-the-art deep neural networks for facial expression recognition, achieving top performance on benchmarks like FER and SFEW 2.0.
Differently from computer vision systems which require explicit supervision, humans can learn facial expressions by observing people in their environment. In this paper, we look at how similar capabilities could be developed in machine vision. As a starting point, we consider the problem of relating facial expressions to objectively measurable events occurring in videos. In particular, we consider a gameshow in which contestants play to win significant sums of money. We extract events affecting the game and corresponding facial expressions objectively and automatically from the videos, obtaining large quantities of labelled data for our study. We also develop, using benchmarks such as FER and SFEW 2.0, state-of-the-art deep neural networks for facial expression recognition, showing that pre-training on face verification data can be highly beneficial for this task. Then, we extend these models to use facial expressions to predict events in videos and learn nameable expressions from them. The dataset and emotion recognition models are available at http://www.robots.ox.ac.uk/~vgg/data/facevalue