SDLGASApr 12, 2022

Speech Emotion Recognition with Global-Aware Fusion on Multi-scale Feature Representation

arXiv:2204.05571v156 citationsh-index: 16Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of capturing rich emotional features at different scales and global information in speech emotion recognition, representing an incremental improvement over existing methods.

The paper tackles the problem of speech emotion recognition by proposing a GLobal-Aware Multi-scale (GLAM) neural network to capture multi-scale features and global emotional information, achieving 2.5% to 4.5% improvements on four metrics over state-of-the-art methods on the IEMOCAP corpus.

Speech Emotion Recognition (SER) is a fundamental task to predict the emotion label from speech data. Recent works mostly focus on using convolutional neural networks~(CNNs) to learn local attention map on fixed-scale feature representation by viewing time-varied spectral features as images. However, rich emotional feature at different scales and important global information are not able to be well captured due to the limits of existing CNNs for SER. In this paper, we propose a novel GLobal-Aware Multi-scale (GLAM) neural network (The code is available at https://github.com/lixiangucas01/GLAM) to learn multi-scale feature representation with global-aware fusion module to attend emotional information. Specifically, GLAM iteratively utilizes multiple convolutional kernels with different scales to learn multiple feature representation. Then, instead of using attention-based methods, a simple but effective global-aware fusion module is applied to grab most important emotional information globally. Experiments on the benchmark corpus IEMOCAP over four emotions demonstrates the superiority of our proposed model with 2.5% to 4.5% improvements on four common metrics compared to previous state-of-the-art approaches.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes