CLJun 2, 2017

Attentive Convolutional Neural Network based Speech Emotion Recognition: A Study on the Impact of Input Features, Signal Length, and Acted Speech

arXiv:1706.00612v1233 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of improving emotion recognition accuracy for human-computer interaction, but it is incremental as it builds on existing models and datasets.

The study tackled speech emotion recognition by investigating how input features, signal length, and speech type affect performance using an attentive convolutional neural network, achieving state-of-the-art results on improvised speech data from the IEMOCAP database.

Speech emotion recognition is an important and challenging task in the realm of human-computer interaction. Prior work proposed a variety of models and feature sets for training a system. In this work, we conduct extensive experiments using an attentive convolutional neural network with multi-view learning objective function. We compare system performance using different lengths of the input signal, different types of acoustic features and different types of emotion speech (improvised/scripted). Our experimental results on the Interactive Emotional Motion Capture (IEMOCAP) database reveal that the recognition performance strongly depends on the type of speech data independent of the choice of input features. Furthermore, we achieved state-of-the-art results on the improvised speech data of IEMOCAP.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes