CVLGMLFeb 9, 2020

Two-Stream Aural-Visual Affect Analysis in the Wild

arXiv:2002.03399v20.0085 citations
AI Analysis50

This work addresses the problem of recognizing affective behavior in uncontrolled environments for human-computer interaction, representing an incremental improvement over existing methods.

The paper tackles human affect recognition from in-the-wild videos by proposing a two-stream aural-visual model that processes audio and image streams separately using temporal convolutions and additional face-alignment features, achieving promising results on the Aff-Wild2 database.

Human affect recognition is an essential part of natural human-computer interaction. However, current methods are still in their infancy, especially for in-the-wild data. In this work, we introduce our submission to the Affective Behavior Analysis in-the-wild (ABAW) 2020 competition. We propose a two-stream aural-visual analysis model to recognize affective behavior from videos. Audio and image streams are first processed separately and fed into a convolutional neural network. Instead of applying recurrent architectures for temporal analysis we only use temporal convolutions. Furthermore, the model is given access to additional features extracted during face-alignment. At training time, we exploit correlations between different emotion representations to improve performance. Our model achieves promising results on the challenging Aff-Wild2 database.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes