HCAICLMar 21, 2018

Speech Emotion Recognition Considering Local Dynamic Features

arXiv:1803.07738v110 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of accurately recognizing emotions in speech for applications like human-computer interaction, but it is incremental as it builds on existing feature extraction methods.

The paper tackled speech emotion recognition by proposing a novel local dynamic pitch probability distribution feature to capture dynamic emotional expressions, achieving improved accuracy over traditional global features on the Berlin Database of Emotional Speech.

Recently, increasing attention has been directed to the study of the speech emotion recognition, in which global acoustic features of an utterance are mostly used to eliminate the content differences. However, the expression of speech emotion is a dynamic process, which is reflected through dynamic durations, energies, and some other prosodic information when one speaks. In this paper, a novel local dynamic pitch probability distribution feature, which is obtained by drawing the histogram, is proposed to improve the accuracy of speech emotion recognition. Compared with most of the previous works using global features, the proposed method takes advantage of the local dynamic information conveyed by the emotional speech. Several experiments on Berlin Database of Emotional Speech are conducted to verify the effectiveness of the proposed method. The experimental results demonstrate that the local dynamic information obtained with the proposed method is more effective for speech emotion recognition than the traditional global features.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes