MultiGen: Child-Friendly Multilingual Speech Generator with LLMs
This work addresses the problem of improving AI communication for young children in low-resource language contexts, representing an incremental advancement in speech generation.
The paper tackled the challenge of generating high-quality, child-friendly speech for low-resource languages by proposing MultiGen, a multilingual speech generation model using LLM architecture, which showed superior performance in experiments on Singaporean Mandarin, Malay, and Tamil.
Generative speech models have demonstrated significant potential in improving human-machine interactions, offering valuable real-world applications such as language learning for children. However, achieving high-quality, child-friendly speech generation remains challenging, particularly for low-resource languages across diverse languages and cultural contexts. In this paper, we propose MultiGen, a multilingual speech generation model with child-friendly interaction, leveraging LLM architecture for speech generation tailored for low-resource languages. We propose to integrate age-appropriate multilingual speech generation using LLM architectures, which can be used to facilitate young children's communication with AI systems through culturally relevant context in three low-resource languages: Singaporean accent Mandarin, Malay, and Tamil. Experimental results from both objective metrics and subjective evaluations demonstrate the superior performance of the proposed MultiGen compared to baseline methods.