SD CL ASJun 11, 2021

HUI-Audio-Corpus-German: A high quality TTS dataset

Pascal Puchtler, Johannes Wirth, René Peinl

arXiv:2106.06309v128 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This provides a valuable resource for TTS development in German, addressing a gap in data availability for this language.

The authors tackled the problem of limited and low-quality German text-to-speech datasets by introducing the HUI-Audio-Corpus-German, a large, open-source dataset with high-quality audio-transcription alignments that reduces manual effort.

The increasing availability of audio data on the internet lead to a multitude of datasets for development and training of text to speech applications, based on neural networks. Highly differing quality of voice, low sampling rates, lack of text normalization and disadvantageous alignment of audio samples to corresponding transcript sentences still limit the performance of deep neural networks trained on this task. Additionally, data resources in languages like German are still very limited. We introduce the "HUI-Audio-Corpus-German", a large, open-source dataset for TTS engines, created with a processing pipeline, which produces high quality audio to transcription alignments and decreases manual effort needed for creation.

View on arXiv PDF

Similar