ASSDMar 4, 2020

Learning Fast Adaptation on Cross-Accented Speech Recognition

arXiv:2003.01901v191 citations
AI Analysis

This work addresses the problem of accent variability in ASR for users of speech technology, presenting an incremental improvement through adaptation to unseen accents.

The paper tackles the challenge of training robust automatic speech recognition (ASR) systems across diverse accents by introducing a cross-accented English benchmark and proposing an accent-agnostic approach based on meta-learning. The method significantly outperforms joint training in zero-shot, few-shot, and all-shot settings, reducing word error rates in mixed-region and cross-region scenarios.

Local dialects influence people to pronounce words of the same language differently from each other. The great variability and complex characteristics of accents creates a major challenge for training a robust and accent-agnostic automatic speech recognition (ASR) system. In this paper, we introduce a cross-accented English speech recognition task as a benchmark for measuring the ability of the model to adapt to unseen accents using the existing CommonVoice corpus. We also propose an accent-agnostic approach that extends the model-agnostic meta-learning (MAML) algorithm for fast adaptation to unseen accents. Our approach significantly outperforms joint training in both zero-shot, few-shot, and all-shot in the mixed-region and cross-region settings in terms of word error rate.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes