ASCLMay 22, 2025

Meta-PerSER: Few-Shot Listener Personalized Speech Emotion Recognition via Meta-learning

arXiv:2505.16220v11 citationsh-index: 11INTERSPEECH
Originality Incremental advance
AI Analysis

This addresses the issue of inconsistent predictions in speech emotion recognition for listeners by personalizing models, though it is incremental as it builds on existing meta-learning and self-supervised techniques.

The paper tackles the problem of personalized speech emotion recognition by adapting to individual listeners' interpretations, achieving significant performance improvements over baseline methods on the IEMOCAP corpus in both seen and unseen data scenarios.

This paper introduces Meta-PerSER, a novel meta-learning framework that personalizes Speech Emotion Recognition (SER) by adapting to each listener's unique way of interpreting emotion. Conventional SER systems rely on aggregated annotations, which often overlook individual subtleties and lead to inconsistent predictions. In contrast, Meta-PerSER leverages a Model-Agnostic Meta-Learning (MAML) approach enhanced with Combined-Set Meta-Training, Derivative Annealing, and per-layer per-step learning rates, enabling rapid adaptation with only a few labeled examples. By integrating robust representations from pre-trained self-supervised models, our framework first captures general emotional cues and then fine-tunes itself to personal annotation styles. Experiments on the IEMOCAP corpus demonstrate that Meta-PerSER significantly outperforms baseline methods in both seen and unseen data scenarios, highlighting its promise for personalized emotion recognition.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes