CLSDASMay 31, 2023

Text-to-Speech Pipeline for Swiss German -- A comparison

arXiv:2305.19750v11 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of generating high-quality speech for Swiss German dialects, which is an incremental improvement in a domain-specific area.

The authors tackled the problem of synthesizing Swiss German speech by evaluating various Text-to-Speech models, finding that VITS models performed best and achieved previously unachieved quality for different dialects.

In this work, we studied the synthesis of Swiss German speech using different Text-to-Speech (TTS) models. We evaluated the TTS models on three corpora, and we found, that VITS models performed best, hence, using them for further testing. We also introduce a new method to evaluate TTS models by letting the discriminator of a trained vocoder GAN model predict whether a given waveform is human or synthesized. In summary, our best model delivers speech synthesis for different Swiss German dialects with previously unachieved quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes