CL SD ASNov 11, 2024

Building a Taiwanese Mandarin Spoken Language Model: A First Attempt

Chih-Kai Yang, Yu-Kuan Fu, Chen-An Li, Yi-Cheng Lin, Yu-Xiang Lin, Wei-Chih Chen, Ho Lam Chung, Chun-Yi Kuan, Wei-Ping Huang, Ke-Han Lu, Tzu-Quan Lin, Hsiu-Hsuan Wang

arXiv:2411.07111v212.222 citationsh-index: 14Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the need for conversational AI in Taiwanese Mandarin, though it is an incremental step as a first attempt in this specific domain.

The researchers tackled the problem of enabling real-time, speech-to-speech interaction in Taiwanese Mandarin by building a spoken large language model, achieving a model that incorporates full-duplex capabilities for seamless multi-turn conversations.

This technical report presents our initial attempt to build a spoken large language model (LLM) for Taiwanese Mandarin, specifically tailored to enable real-time, speech-to-speech interaction in multi-turn conversations. Our end-to-end model incorporates a decoder-only transformer architecture and aims to achieve seamless interaction while preserving the conversational flow, including full-duplex capabilities allowing simultaneous speaking and listening. The paper also details the training process, including data preparation with synthesized dialogues and adjustments for real-time interaction. We also developed a platform to evaluate conversational fluency and response coherence in multi-turn dialogues. We hope the release of the report can contribute to the future development of spoken LLMs in Taiwanese Mandarin.

View on arXiv PDF Code

Similar