AS CLMay 25, 2023

Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration

Rustem Yeshpanov, Saida Mussakhojayeva, Yerbolat Khassanov

arXiv:2305.15749v14.37 citationsHas Code

Originality Synthesis-oriented

AI Analysis

It addresses the problem of speech synthesis for low-resource Turkic languages, which is incremental as it applies an existing method to new data with transliteration.

This work tackled building a multilingual text-to-speech synthesis system for ten lower-resourced Turkic languages using a zero-shot learning approach, achieving promising results in subjective evaluations.

This work aims to build a multilingual text-to-speech (TTS) synthesis system for ten lower-resourced Turkic languages: Azerbaijani, Bashkir, Kazakh, Kyrgyz, Sakha, Tatar, Turkish, Turkmen, Uyghur, and Uzbek. We specifically target the zero-shot learning scenario, where a TTS model trained using the data of one language is applied to synthesise speech for other, unseen languages. An end-to-end TTS system based on the Tacotron 2 architecture was trained using only the available data of the Kazakh language. To generate speech for the other Turkic languages, we first mapped the letters of the Turkic alphabets onto the symbols of the International Phonetic Alphabet (IPA), which were then converted to the Kazakh alphabet letters. To demonstrate the feasibility of the proposed approach, we evaluated the multilingual Turkic TTS model subjectively and obtained promising results. To enable replication of the experiments, we make our code and dataset publicly available in our GitHub repository.

View on arXiv PDF Code

Similar