CLMar 30, 2018

Automatically augmenting an emotion dataset improves classification using audio

arXiv:1803.11506v11088 citations
Originality Synthesis-oriented
AI Analysis

This work tackles data scarcity in affective computing for speech emotion recognition, though it is incremental as it builds on existing methods with a new data source.

The paper addresses the limited annotated data problem in speech emotion classification by automatically constructing a larger audio dataset using textual sentiment analysis from movies, showing that pretraining a recurrent neural network on this dataset improves results on the EmotiW corpus.

In this work, we tackle a problem of speech emotion classification. One of the issues in the area of affective computation is that the amount of annotated data is very limited. On the other hand, the number of ways that the same emotion can be expressed verbally is enormous due to variability between speakers. This is one of the factors that limits performance and generalization. We propose a simple method that extracts audio samples from movies using textual sentiment analysis. As a result, it is possible to automatically construct a larger dataset of audio samples with positive, negative emotional and neutral speech. We show that pretraining recurrent neural network on such a dataset yields better results on the challenging EmotiW corpus. This experiment shows a potential benefit of combining textual sentiment analysis with vocal information.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes