CLSDASMay 8, 2021

Robustness of end-to-end Automatic Speech Recognition Models -- A Case Study using Mozilla DeepSpeech

arXiv:2105.09742v1671 citations
Originality Synthesis-oriented
AI Analysis

This work highlights potential overestimation of ASR model performance for researchers and practitioners, though it is incremental in nature.

The study investigated the robustness of end-to-end automatic speech recognition models by analyzing factors like selection bias, gender, and content overlap, finding that content overlap has the biggest impact on performance.

When evaluating the performance of automatic speech recognition models, usually word error rate within a certain dataset is used. Special care must be taken in understanding the dataset in order to report realistic performance numbers. We argue that many performance numbers reported probably underestimate the expected error rate. We conduct experiments controlling for selection bias, gender as well as overlap (between training and test data) in content, voices, and recording conditions. We find that content overlap has the biggest impact, but other factors like gender also play a role.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes