CLJul 26, 2018

Open Source Automatic Speech Recognition for German

arXiv:1807.10311v140 citations
Originality Synthesis-oriented
AI Analysis

This work provides a robust, open-source ASR model for German, lowering entry costs for applications and research, but it is incremental as it builds on existing methods with new data.

The authors tackled the lack of freely available acoustic models for German automatic speech recognition by training models on 412 hours of Creative Commons-licensed data, achieving a 26% relative word error reduction and a best word error rate of 14.38% on the Tuda-De test set.

High quality Automatic Speech Recognition (ASR) is a prerequisite for speech-based applications and research. While state-of-the-art ASR software is freely available, the language dependent acoustic models are lacking for languages other than English, due to the limited amount of freely available training data. We train acoustic models for German with Kaldi on two datasets, which are both distributed under a Creative Commons license. The resulting model is freely redistributable, lowering the cost of entry for German ASR. The models are trained on a total of 412 hours of German read speech data and we achieve a relative word error reduction of 26% by adding data from the Spoken Wikipedia Corpus to the previously best freely available German acoustic model recipe and dataset. Our best model achieves a word error rate of 14.38 on the Tuda-De test set. Due to the large amount of speakers and the diversity of topics included in the training data, our model is robust against speaker variation and topic shift.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes