Evaluating Gammatone Frequency Cepstral Coefficients with Neural Networks for Emotion Recognition from Speech
This work addresses speech emotion recognition, which is important for applications like human-computer interaction, but it is incremental as it compares two existing feature types rather than introducing a new method.
The paper tackled the problem of improving speech emotion recognition by proposing Gammatone Frequency Cepstral Coefficients (GFCCs) as a better representation than the commonly used Mel Frequency Cepstral Coefficients (MFCCs), and found that GFCCs outperform MFCCs in emotion and intensity classification tasks using neural networks.
Current approaches to speech emotion recognition focus on speech features that can capture the emotional content of a speech signal. Mel Frequency Cepstral Coefficients (MFCCs) are one of the most commonly used representations for audio speech recognition and classification. This paper proposes Gammatone Frequency Cepstral Coefficients (GFCCs) as a potentially better representation of speech signals for emotion recognition. The effectiveness of MFCC and GFCC representations are compared and evaluated over emotion and intensity classification tasks with fully connected and recurrent neural network architectures. The results provide evidence that GFCCs outperform MFCCs in speech emotion recognition.