CL SD ASJan 27, 2022

Sentiment-Aware Automatic Speech Recognition pre-training for enhanced Speech Emotion Recognition

Ayoub Ghriss, Bo Yang, Viktor Rozgic, Elizabeth Shriberg, Chao Wang

arXiv:2201.11826v12.323 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of enhancing emotion recognition in speech for applications like human-computer interaction, though it is incremental as it builds on existing ASR and SER methods.

The paper tackled the problem of improving Speech Emotion Recognition (SER) by proposing a multi-task pre-training method that combines Automatic Speech Recognition (ASR) and sentiment classification, resulting in a state-of-the-art concordance correlation coefficient (CCC) of 0.41 for valence prediction on the MSP-Podcast dataset.

We propose a novel multi-task pre-training method for Speech Emotion Recognition (SER). We pre-train SER model simultaneously on Automatic Speech Recognition (ASR) and sentiment classification tasks to make the acoustic ASR model more ``emotion aware''. We generate targets for the sentiment classification using text-to-sentiment model trained on publicly available data. Finally, we fine-tune the acoustic ASR on emotion annotated speech data. We evaluated the proposed approach on the MSP-Podcast dataset, where we achieved the best reported concordance correlation coefficient (CCC) of 0.41 for valence prediction.

View on arXiv PDF

Similar