Enhancing Speech Emotion Recognition using Dynamic Spectral Features and Kalman Smoothing

Marouane El Hizabri, Abdelfattah Bezzaz, Ismail Hayoukane, Youssef Taki

arXiv:2601.18908v1

Originality Incremental advance

AI Analysis

This work addresses misclassification issues in speech emotion recognition systems, particularly in noisy environments, but appears incremental as it builds on existing feature extraction methods.

The paper tackled the problem of speech emotion recognition misclassification due to acoustic noise by adding dynamic spectral features and Kalman smoothing, achieving a state-of-the-art accuracy of 87% on the RAVDESS dataset.

Speech Emotion Recognition systems often use static features like Mel-Frequency Cepstral Coefficients (MFCCs), Zero Crossing Rate (ZCR), and Root Mean Square Energy (RMSE). Because of this, they can misclassify emotions when there is acoustic noise in vocal signals. To address this, we added dynamic features using Dynamic Spectral features (Deltas and Delta-Deltas) along with the Kalman Smoothing algorithm. This approach reduces noise and improves emotion classification. Since emotion changes over time, the Kalman Smoothing filter also helped make the classifier outputs more stable. Tests on the RAVDESS dataset showed that this method achieved a state-of-the-art accuracy of 87\% and reduced misclassification between emotions with similar acoustic features

View on arXiv PDF

Similar