AICLCVMay 3, 2018

Dimensional emotion recognition using visual and textual cues

arXiv:1805.01416v1
Originality Synthesis-oriented
AI Analysis

This work addresses emotion recognition for applications like human-computer interaction, but it is incremental as it builds on existing multimodal approaches.

The paper tackled automatic emotion recognition in the OMG-Emotion challenge by using a weighted ensemble of models from video and text modalities, achieving results that clearly outperformed baseline methods on the validation set.

This paper addresses the problem of automatic emotion recognition in the scope of the One-Minute Gradual-Emotional Behavior challenge (OMG-Emotion challenge). The underlying objective of the challenge is the automatic estimation of emotion expressions in the two-dimensional emotion representation space (i.e., arousal and valence). The adopted methodology is a weighted ensemble of several models from both video and text modalities. For video-based recognition, two different types of visual cues (i.e., face and facial landmarks) were considered to feed a multi-input deep neural network. Regarding the text modality, a sequential model based on a simple recurrent architecture was implemented. In addition, we also introduce a model based on high-level features in order to embed domain knowledge in the learning process. Experimental results on the OMG-Emotion validation set demonstrate the effectiveness of the implemented ensemble model as it clearly outperforms the current baseline methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes