CLSDASOct 15, 2021

Scribosermo: Fast Speech-to-Text models for German and other Languages

arXiv:2110.07982v111 citations
Originality Incremental advance
AI Analysis

This provides efficient speech-to-text solutions for non-English languages, particularly German, on low-resource hardware, though it is incremental as it builds on existing transfer-learning methods.

The paper tackles the problem of resource-intensive and English-centric speech-to-text models by developing small, real-time models for German, Spanish, and French that run on microcontrollers like RaspberryPi, achieving competitive performance and outperforming other solutions in German.

Recent Speech-to-Text models often require a large amount of hardware resources and are mostly trained in English. This paper presents Speech-to-Text models for German, as well as for Spanish and French with special features: (a) They are small and run in real-time on microcontrollers like a RaspberryPi. (b) Using a pretrained English model, they can be trained on consumer-grade hardware with a relatively small dataset. (c) The models are competitive with other solutions and outperform them in German. In this respect, the models combine advantages of other approaches, which only include a subset of the presented features. Furthermore, the paper provides a new library for handling datasets, which is focused on easy extension with additional datasets and shows an optimized way for transfer-learning new languages using a pretrained model from another language with a similar alphabet.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes