CLSDASJan 27, 2022

Sentiment-Aware Automatic Speech Recognition pre-training for enhanced Speech Emotion Recognition

arXiv:2201.11826v123 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of enhancing emotion recognition in speech for applications like human-computer interaction, though it is incremental as it builds on existing ASR and SER methods.

The paper tackled the problem of improving Speech Emotion Recognition (SER) by proposing a multi-task pre-training method that combines Automatic Speech Recognition (ASR) and sentiment classification, resulting in a state-of-the-art concordance correlation coefficient (CCC) of 0.41 for valence prediction on the MSP-Podcast dataset.

We propose a novel multi-task pre-training method for Speech Emotion Recognition (SER). We pre-train SER model simultaneously on Automatic Speech Recognition (ASR) and sentiment classification tasks to make the acoustic ASR model more ``emotion aware''. We generate targets for the sentiment classification using text-to-sentiment model trained on publicly available data. Finally, we fine-tune the acoustic ASR on emotion annotated speech data. We evaluated the proposed approach on the MSP-Podcast dataset, where we achieved the best reported concordance correlation coefficient (CCC) of 0.41 for valence prediction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes