CLLGSDASJul 31, 2019

Personalizing ASR for Dysarthric and Accented Speech with Limited Data

arXiv:1907.13511v1128 citations
Originality Incremental advance
AI Analysis

This work addresses the issue of ASR performance disparities for users with dysarthria or accents, offering a practical solution with limited data, though it is incremental as it builds on existing finetuning methods.

The paper tackled the problem of improving automatic speech recognition (ASR) for underrepresented groups with non-standard speech, such as dysarthric and accented speech, by developing personalized finetuning techniques that achieved 62% and 35% relative WER improvements, reducing absolute WER to 10% for mild dysarthria and 20% for severe dysarthria.

Automatic speech recognition (ASR) systems have dramatically improved over the last few years. ASR systems are most often trained from 'typical' speech, which means that underrepresented groups don't experience the same level of improvement. In this paper, we present and evaluate finetuning techniques to improve ASR for users with non-standard speech. We focus on two types of non-standard speech: speech from people with amyotrophic lateral sclerosis (ALS) and accented speech. We train personalized models that achieve 62% and 35% relative WER improvement on these two groups, bringing the absolute WER for ALS speakers, on a test set of message bank phrases, down to 10% for mild dysarthria and 20% for more serious dysarthria. We show that 71% of the improvement comes from only 5 minutes of training data. Finetuning a particular subset of layers (with many fewer parameters) often gives better results than finetuning the entire model. This is the first step towards building state of the art ASR models for dysarthric speech.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes