Giving Voice to the Constitution: Low-Resource Text-to-Speech for Quechua and Spanish Using a Bilingual Legal Corpus
This work addresses the lack of TTS resources for indigenous languages like Quechua in legal domains, but the results are incremental as it applies existing methods to a new bilingual corpus.
The authors develop a unified text-to-speech pipeline for Quechua and Spanish using three TTS architectures, leveraging cross-lingual transfer to improve synthesis quality in both languages despite data scarcity in Quechua. They release trained models and synthesized audio for the Peruvian Constitution.
We present a unified pipeline for synthesizing high-quality Quechua and Spanish speech for the Peruvian Constitution using three state-of-the-art text-to-speech (TTS) architectures: XTTS v2, F5-TTS, and DiFlow-TTS. Our models are trained on independent Spanish and Quechua speech datasets with heterogeneous sizes and recording conditions, and leverage bilingual and multilingual TTS capabilities to improve synthesis quality in both languages. By exploiting cross-lingual transfer, our framework mitigates data scarcity in Quechua while preserving naturalness in Spanish. We release trained checkpoints, inference code, and synthesized audio for each constitutional article, providing a reusable resource for speech technologies in indigenous and multilingual contexts. This work contributes to the development of inclusive TTS systems for political and legal content in low-resource settings.