AS LG SD SPFeb 8, 2021

Non-linear frequency warping using constant-Q transformation for speech emotion recognition

Premjeet Singh, Goutam Saha, Md Sahidullah

arXiv:2102.04029v13.324 citations

Originality Incremental advance

AI Analysis

This work provides an incremental improvement in feature extraction for speech emotion recognition, potentially benefiting applications requiring robust emotion detection.

This paper explores the Constant-Q Transform (CQT) for speech emotion recognition (SER), demonstrating that CQT-based features outperform standard Short-Time Fourier Transform (STFT) features. The CQT-based systems also show better generalization in cross-corpora evaluations.

In this work, we explore the constant-Q transform (CQT) for speech emotion recognition (SER). The CQT-based time-frequency analysis provides variable spectro-temporal resolution with higher frequency resolution at lower frequencies. Since lower-frequency regions of speech signal contain more emotion-related information than higher-frequency regions, the increased low-frequency resolution of CQT makes it more promising for SER than standard short-time Fourier transform (STFT). We present a comparative analysis of short-term acoustic features based on STFT and CQT for SER with deep neural network (DNN) as a back-end classifier. We optimize different parameters for both features. The CQT-based features outperform the STFT-based spectral features for SER experiments. Further experiments with cross-corpora evaluation demonstrate that the CQT-based systems provide better generalization with out-of-domain training data.

View on arXiv PDF

Similar