SDLGASJan 26

Enhancing Speech Emotion Recognition using Dynamic Spectral Features and Kalman Smoothing

arXiv:2601.18908v1
Originality Incremental advance
AI Analysis

This work addresses misclassification issues in speech emotion recognition systems, particularly in noisy environments, but appears incremental as it builds on existing feature extraction methods.

The paper tackled the problem of speech emotion recognition misclassification due to acoustic noise by adding dynamic spectral features and Kalman smoothing, achieving a state-of-the-art accuracy of 87% on the RAVDESS dataset.

Speech Emotion Recognition systems often use static features like Mel-Frequency Cepstral Coefficients (MFCCs), Zero Crossing Rate (ZCR), and Root Mean Square Energy (RMSE). Because of this, they can misclassify emotions when there is acoustic noise in vocal signals. To address this, we added dynamic features using Dynamic Spectral features (Deltas and Delta-Deltas) along with the Kalman Smoothing algorithm. This approach reduces noise and improves emotion classification. Since emotion changes over time, the Kalman Smoothing filter also helped make the classifier outputs more stable. Tests on the RAVDESS dataset showed that this method achieved a state-of-the-art accuracy of 87\% and reduced misclassification between emotions with similar acoustic features

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes