CL AISep 22, 2025

TMD-TTS: A Unified Tibetan Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation

Yutong Liu, Ziyue Zhang, Ban Ma-bao, Renzeng Duojie, Yuqing Cai, Yongbin Yu, Xiangxiang Wang, Fan Gao, Cheng Huang, Nyima Tashi

arXiv:2509.18060v18.32 citationsh-index: 17

Originality Incremental advance

AI Analysis

This addresses the challenge of low-resource language processing for Tibetan speakers by enabling multi-dialect speech synthesis, though it is incremental as it builds on existing TTS methods.

The paper tackled the problem of limited parallel speech corpora for Tibetan dialects by proposing TMD-TTS, a unified text-to-speech framework that synthesizes dialectal speech, resulting in significantly outperforming baselines in dialectal expressiveness as validated through objective and subjective evaluations.

Tibetan is a low-resource language with limited parallel speech corpora spanning its three major dialects (Ü-Tsang, Amdo, and Kham), limiting progress in speech modeling. To address this issue, we propose TMD-TTS, a unified Tibetan multi-dialect text-to-speech (TTS) framework that synthesizes parallel dialectal speech from explicit dialect labels. Our method features a dialect fusion module and a Dialect-Specialized Dynamic Routing Network (DSDR-Net) to capture fine-grained acoustic and linguistic variations across dialects. Extensive objective and subjective evaluations demonstrate that TMD-TTS significantly outperforms baselines in dialectal expressiveness. We further validate the quality and utility of the synthesized speech through a challenging Speech-to-Speech Dialect Conversion (S2SDC) task.

View on arXiv PDF

Similar