CLSDASSep 10, 2024

Enhancing Kurdish Text-to-Speech with Native Corpus Training: A High-Quality WaveGlow Vocoder Approach

arXiv:2409.13734v2h-index: 7
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of limited TTS resources for low-resource languages like Central Kurdish, though it is incremental as it adapts existing methods to a new language.

The paper tackled the challenge of text-to-speech synthesis for low-resource Central Kurdish by training a WaveGlow vocoder on a native 21-hour corpus instead of using a pre-trained English model, resulting in a model that achieved a MOS of 4.91, setting a new benchmark for Kurdish speech synthesis.

The ability to synthesize spoken language from text has greatly facilitated access to digital content with the advances in text-to-speech technology. However, effective TTS development for low-resource languages, such as Central Kurdish (CKB), still faces many challenges due mainly to the lack of linguistic information and dedicated resources. In this paper, we improve the Kurdish TTS system based on Tacotron by training the Kurdish WaveGlow vocoder on a 21-hour central Kurdish speech corpus instead of using a pre-trained English vocoder WaveGlow. Vocoder training on the target language corpus is required to accurately and fluently adapt phonetic and prosodic changes in Kurdish language. The effectiveness of these enhancements is that our model is significantly better than the baseline system with English pretrained models. In particular, our adaptive WaveGlow model achieves an impressive MOS of 4.91, which sets a new benchmark for Kurdish speech synthesis. On one hand, this study empowers the advanced features of the TTS system for Central Kurdish, and on the other hand, it opens the doors for other dialects in Kurdish and other related languages to further develop.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes