How Deep Neural Networks Can Improve Emotion Recognition on Video Data
This work addresses emotion recognition for video analysis, but it is incremental as it builds on existing deep learning approaches.
The paper tackles dimensional emotion recognition on video data by combining convolutional neural networks (CNNs) and recurrent neural networks (RNNs), analyzing their contributions and achieving superior performance compared to baselines and other methods on the AV+EC2015 dataset.
We consider the task of dimensional emotion recognition on video data using deep learning. While several previous methods have shown the benefits of training temporal neural network models such as recurrent neural networks (RNNs) on hand-crafted features, few works have considered combining convolutional neural networks (CNNs) with RNNs. In this work, we present a system that performs emotion recognition on video data using both CNNs and RNNs, and we also analyze how much each neural network component contributes to the system's overall performance. We present our findings on videos from the Audio/Visual+Emotion Challenge (AV+EC2015). In our experiments, we analyze the effects of several hyperparameters on overall performance while also achieving superior performance to the baseline and other competing methods.