SDCLLGASJul 21, 2023

A Change of Heart: Improving Speech Emotion Recognition through Speech-to-Text Modality Conversion

arXiv:2307.11584v15 citationsh-index: 17Has Code
Originality Incremental advance
AI Analysis

This work addresses emotion recognition from speech, a domain-specific task, with incremental improvements through modality conversion.

The paper tackled speech emotion recognition by converting speech to text using ASR and then classifying emotions, achieving state-of-the-art weighted-F1 scores on the MELD dataset.

Speech Emotion Recognition (SER) is a challenging task. In this paper, we introduce a modality conversion concept aimed at enhancing emotion recognition performance on the MELD dataset. We assess our approach through two experiments: first, a method named Modality-Conversion that employs automatic speech recognition (ASR) systems, followed by a text classifier; second, we assume perfect ASR output and investigate the impact of modality conversion on SER, this method is called Modality-Conversion++. Our findings indicate that the first method yields substantial results, while the second method outperforms state-of-the-art (SOTA) speech-based approaches in terms of SER weighted-F1 (WF1) score on the MELD dataset. This research highlights the potential of modality conversion for tasks that can be conducted in alternative modalities.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes