AS AI LG SD MLJun 11, 2019

Focal Loss based Residual Convolutional Neural Network for Speech Emotion Recognition

Suraj Tripathi, Abhay Kumar, Abhiram Ramesh, Chirag Singh, Promod Yenigalla

arXiv:1906.05682v12.314 citationsh-index: 10

Originality Synthesis-oriented

AI Analysis

This work addresses emotion recognition in speech, which is important for applications like human-computer interaction, but it appears incremental as it combines existing methods.

The paper tackled speech emotion recognition by proposing a Residual Convolutional Neural Network trained with Focal Loss, achieving improved performance on benchmark datasets.

This paper proposes a Residual Convolutional Neural Network (ResNet) based on speech features and trained under Focal Loss to recognize emotion in speech. Speech features such as Spectrogram and Mel-frequency Cepstral Coefficients (MFCCs) have shown the ability to characterize emotion better than just plain text. Further Focal Loss, first used in One-Stage Object Detectors, has shown the ability to focus the training process more towards hard-examples and down-weight the loss assigned to well-classified examples, thus preventing the model from being overwhelmed by easily classifiable examples.

View on arXiv PDF

Similar