CLSDASJul 1, 2022

Building African Voices

arXiv:2207.00688v115 citationsh-index: 91
Originality Synthesis-oriented
AI Analysis

This work addresses the lack of speech data for African languages, enabling researchers and developers to build TTS systems, though it is incremental in applying existing methods to new data.

The paper tackles the problem of speech synthesis for low-resourced African languages by creating datasets and TTS systems with minimal resources, demonstrating intelligible speech generation using only 25 minutes of created speech recorded in suboptimal environments.

Modern speech synthesis techniques can produce natural-sounding speech given sufficient high-quality data and compute resources. However, such data is not readily available for many languages. This paper focuses on speech synthesis for low-resourced African languages, from corpus creation to sharing and deploying the Text-to-Speech (TTS) systems. We first create a set of general-purpose instructions on building speech synthesis systems with minimum technological resources and subject-matter expertise. Next, we create new datasets and curate datasets from "found" data (existing recordings) through a participatory approach while considering accessibility, quality, and breadth. We demonstrate that we can develop synthesizers that generate intelligible speech with 25 minutes of created speech, even when recorded in suboptimal environments. Finally, we release the speech data, code, and trained voices for 12 African languages to support researchers and developers.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes