CLSDASNov 11, 2024

Building a Taiwanese Mandarin Spoken Language Model: A First Attempt

arXiv:2411.07111v222 citationsh-index: 14
Originality Synthesis-oriented
AI Analysis

This work addresses the need for conversational AI in Taiwanese Mandarin, though it is an incremental step as a first attempt in this specific domain.

The researchers tackled the problem of enabling real-time, speech-to-speech interaction in Taiwanese Mandarin by building a spoken large language model, achieving a model that incorporates full-duplex capabilities for seamless multi-turn conversations.

This technical report presents our initial attempt to build a spoken large language model (LLM) for Taiwanese Mandarin, specifically tailored to enable real-time, speech-to-speech interaction in multi-turn conversations. Our end-to-end model incorporates a decoder-only transformer architecture and aims to achieve seamless interaction while preserving the conversational flow, including full-duplex capabilities allowing simultaneous speaking and listening. The paper also details the training process, including data preparation with synthesized dialogues and adjustments for real-time interaction. We also developed a platform to evaluate conversational fluency and response coherence in multi-turn dialogues. We hope the release of the report can contribute to the future development of spoken LLMs in Taiwanese Mandarin.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes