CLAIApr 12, 2022

ASR in German: A Detailed Error Analysis

arXiv:2204.05617v17 citationsh-index: 12
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of opaque error analysis in ASR systems for German language applications, though it appears incremental as it focuses on error categorization rather than fundamental breakthroughs.

This paper analyzes error patterns in German automatic speech recognition (ASR) systems by evaluating multiple neural network architectures on diverse test datasets, identifying cross-architectural errors and tracing them to training data and other sources to propose solutions for better datasets and more robust systems.

The amount of freely available systems for automatic speech recognition (ASR) based on neural networks is growing steadily, with equally increasingly reliable predictions. However, the evaluation of trained models is typically exclusively based on statistical metrics such as WER or CER, which do not provide any insight into the nature or impact of the errors produced when predicting transcripts from speech input. This work presents a selection of ASR model architectures that are pretrained on the German language and evaluates them on a benchmark of diverse test datasets. It identifies cross-architectural prediction errors, classifies those into categories and traces the sources of errors per category back into training data as well as other sources. Finally, it discusses solutions in order to create qualitatively better training datasets and more robust ASR systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes