HCNov 2, 2020

Multimodal Continuous Emotion Recognition using Deep Multi-Task Learning with Correlation Loss

arXiv:2011.00876v12 citations
AI Analysis

This work addresses emotion recognition for applications like human-computer interaction, but it is incremental as it builds on existing methods with multi-task learning and correlation losses.

The study tackled continuous emotion recognition by estimating Activation, Valence, and Dominance attributes using body motion and speech signals, achieving significant improvements such as over 7% and 13% CCC gains on CreativeIT and RECOLA databases with multi-task learning and correlation losses compared to single-task learning and MSE loss.

In this study, we focus on continuous emotion recognition using body motion and speech signals to estimate Activation, Valence, and Dominance (AVD) attributes. Semi-End-To-End network architecture is proposed where both extracted features and raw signals are fed, and this network is trained using multi-task learning (MTL) rather than the state-of-the-art single task learning (STL). Furthermore, correlation losses, Concordance Correlation Coefficient (CCC) and Pearson Correlation Coefficient (PCC), are used as an optimization objective during the training. Experiments are conducted on CreativeIT and RECOLA database, and evaluations are performed using the CCC metric. To highlight the effect of MTL, correlation losses and multi-modality, we respectively compare the performance of MTL against STL, CCC loss against root mean square error (MSE) loss and, PCC loss, multi-modality against single modality. We observe significant performance improvements with MTL training over STL, especially for estimation of the valence. Furthermore, the CCC loss achieves more than 7% CCC improvements on CreativeIT, and 13% improvements on RECOLA against MSE loss.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes