SD AI ASApr 21, 2024

MFHCA: Enhancing Speech Emotion Recognition Via Multi-Spatial Fusion and Hierarchical Cooperative Attention

arXiv:2404.13509v112.012 citationsh-index: 12ICME

Originality Incremental advance

AI Analysis

This work addresses the challenge of extracting emotional cues from audio for human-computer interaction, representing an incremental advance in speech emotion recognition.

The paper tackles speech emotion recognition by proposing MFHCA, a method using multi-spatial fusion and hierarchical cooperative attention on spectrograms and raw audio, achieving improvements of 2.6% in weighted accuracy and 1.87% in unweighted accuracy on the IEMOCAP dataset.

Speech emotion recognition is crucial in human-computer interaction, but extracting and using emotional cues from audio poses challenges. This paper introduces MFHCA, a novel method for Speech Emotion Recognition using Multi-Spatial Fusion and Hierarchical Cooperative Attention on spectrograms and raw audio. We employ the Multi-Spatial Fusion module (MF) to efficiently identify emotion-related spectrogram regions and integrate Hubert features for higher-level acoustic information. Our approach also includes a Hierarchical Cooperative Attention module (HCA) to merge features from various auditory levels. We evaluate our method on the IEMOCAP dataset and achieve 2.6\% and 1.87\% improvements on the weighted accuracy and unweighted accuracy, respectively. Extensive experiments demonstrate the effectiveness of the proposed method.

View on arXiv PDF

Similar