CLSDASJun 10, 2025

Multi-Teacher Language-Aware Knowledge Distillation for Multilingual Speech Emotion Recognition

arXiv:2506.08717v11 citationsh-index: 6Has CodeINTERSPEECH
Originality Incremental advance
AI Analysis

This work addresses the challenge of multilingual speech emotion recognition for improving human-computer interaction, representing an incremental advance over existing methods.

The paper tackled the problem of building a multilingual speech emotion recognition system by introducing a language-aware multi-teacher knowledge distillation method, achieving state-of-the-art performance with a weighted recall of 72.9 on English and an unweighted recall of 63.4 on Finnish datasets.

Speech Emotion Recognition (SER) is crucial for improving human-computer interaction. Despite strides in monolingual SER, extending them to build a multilingual system remains challenging. Our goal is to train a single model capable of multilingual SER by distilling knowledge from multiple teacher models. To address this, we introduce a novel language-aware multi-teacher knowledge distillation method to advance SER in English, Finnish, and French. It leverages Wav2Vec2.0 as the foundation of monolingual teacher models and then distills their knowledge into a single multilingual student model. The student model demonstrates state-of-the-art performance, with a weighted recall of 72.9 on the English dataset and an unweighted recall of 63.4 on the Finnish dataset, surpassing fine-tuning and knowledge distillation baselines. Our method excels in improving recall for sad and neutral emotions, although it still faces challenges in recognizing anger and happiness.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes