CLLGSDASFeb 24, 2023

Pre-Finetuning for Few-Shot Emotional Speech Recognition

arXiv:2302.12921v36 citationsh-index: 44
Originality Incremental advance
AI Analysis

This addresses speaker adaptation for emotional speech recognition, but it appears incremental as it builds on existing pre-trained models and few-shot learning methods.

The paper tackled poor generalization of speech models to out-of-domain speakers by proposing pre-finetuning on emotional speech tasks, achieving evaluation through 33,600 few-shot trials on the Emotional Speech Dataset.

Speech models have long been known to overfit individual speakers for many classification tasks. This leads to poor generalization in settings where the speakers are out-of-domain or out-of-distribution, as is common in production environments. We view speaker adaptation as a few-shot learning problem and propose investigating transfer learning approaches inspired by recent success with pre-trained models in natural language tasks. We propose pre-finetuning speech models on difficult tasks to distill knowledge into few-shot downstream classification objectives. We pre-finetune Wav2Vec2.0 on every permutation of four multiclass emotional speech recognition corpora and evaluate our pre-finetuned models through 33,600 few-shot fine-tuning trials on the Emotional Speech Dataset.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes