Emotion Recognition From Speech With Recurrent Neural Networks
This work addresses emotion recognition from speech, which is important for applications like human-computer interaction, but it appears incremental as it builds on existing deep learning methods.
The paper tackles emotion recognition from speech by using a deep recurrent neural network with a CTC loss function to handle long utterances containing emotional and neutral parts, achieving high quality as shown by comparisons with recent advances and human performance.
In this paper the task of emotion recognition from speech is considered. Proposed approach uses deep recurrent neural network trained on a sequence of acoustic features calculated over small speech intervals. At the same time special probabilistic-nature CTC loss function allows to consider long utterances containing both emotional and neutral parts. The effectiveness of such an approach is shown in two ways. Firstly, the comparison with recent advances in this field is carried out. Secondly, human performance on the same task is measured. Both criteria show the high quality of the proposed method.