CVNov 13, 2017

Convolutional neural networks pretrained on large face recognition datasets for emotion classification from video

Boris Knyazev, Roman Shvetsov, Natalia Efremova, Artem Kuharenko

arXiv:1711.04598v113.263 citations

Originality Synthesis-oriented

AI Analysis

This work addresses emotion recognition for video analysis, but it is incremental as it builds on existing methods with pretrained networks.

The authors tackled emotion classification from video by using an ensemble of models with spatial and audio features, achieving a 60.03% accuracy on the EmotiW 2017 test set, which improved the previous best result by about 1% without using visual temporal information.

In this paper we describe a solution to our entry for the emotion recognition challenge EmotiW 2017. We propose an ensemble of several models, which capture spatial and audio features from videos. Spatial features are captured by convolutional neural networks, pretrained on large face recognition datasets. We show that usage of strong industry-level face recognition networks increases the accuracy of emotion recognition. Using our ensemble we improve on the previous best result on the test set by about 1 %, achieving a 60.03 % classification accuracy without any use of visual temporal information.

View on arXiv PDF

Similar