ASCVHCSep 15, 2021

FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition

arXiv:2109.07916v1
Originality Incremental advance
AI Analysis

This work addresses speech emotion recognition for potential applications in mental and emotional healthcare, though it appears incremental as it builds on existing CNN approaches with feature improvements.

The authors tackled speech emotion recognition by developing FSER, a deep convolutional neural network model that uses mel-spectrograms instead of MFCC features, achieving 95.05% accuracy across 8 emotion classes on four speech databases and outperforming previous models.

Using mel-spectrograms over conventional MFCCs features, we assess the abilities of convolutional neural networks to accurately recognize and classify emotions from speech data. We introduce FSER, a speech emotion recognition model trained on four valid speech databases, achieving a high-classification accuracy of 95,05\%, over 8 different emotion classes: anger, anxiety, calm, disgust, happiness, neutral, sadness, surprise. On each benchmark dataset, FSER outperforms the best models introduced so far, achieving a state-of-the-art performance. We show that FSER stays reliable, independently of the language, sex identity, and any other external factor. Additionally, we describe how FSER could potentially be used to improve mental and emotional health care and how our analysis and findings serve as guidelines and benchmarks for further works in the same direction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes