Cross Lingual Cross Corpus Speech Emotion Recognition
This work addresses the challenge of improving SER systems for multilingual and cross-corpus applications, though it is incremental as it builds on existing multi-task learning frameworks.
The paper tackled the problem of speech emotion recognition (SER) models performing poorly in cross-corpus and cross-language settings by evaluating on 4 languages and introducing language ID as an auxiliary task in multi-task learning, resulting in enhanced generalization capabilities for emotion recognition.
The majority of existing speech emotion recognition models are trained and evaluated on a single corpus and a single language setting. These systems do not perform as well when applied in a cross-corpus and cross-language scenario. This paper presents results for speech emotion recognition for 4 languages in both single corpus and cross corpus setting. Additionally, since multi-task learning (MTL) with gender, naturalness and arousal as auxiliary tasks has shown to enhance the generalisation capabilities of the emotion models, this paper introduces language ID as another auxiliary task in MTL framework to explore the role of spoken language on emotion recognition which has not been studied yet.