CL ASFeb 19, 2025

LLM-Enhanced Dialogue Management for Full-Duplex Spoken Dialogue Systems

Hao Zhang, Weiwei Li, Rilin Chen, Vinay Kothapally, Meng Yu, Dong Yu

arXiv:2502.14145v213.013 citationsh-index: 5

Originality Highly original

AI Analysis

This addresses the problem of scalable and efficient full-duplex communication for spoken dialogue systems, representing an incremental improvement with a novel method for a known bottleneck.

The paper tackles the challenge of real-time turn-taking coordination in full-duplex spoken dialogue systems by proposing a semantic voice activity detection module as a dialogue manager, which uses a lightweight 0.5B LLM to predict control tokens for managing barge-ins and query completion, achieving efficient real-time decision-making while reducing computational overhead.

Achieving full-duplex communication in spoken dialogue systems (SDS) requires real-time coordination between listening, speaking, and thinking. This paper proposes a semantic voice activity detection (VAD) module as a dialogue manager (DM) to efficiently manage turn-taking in full-duplex SDS. Implemented as a lightweight (0.5B) LLM fine-tuned on full-duplex conversation data, the semantic VAD predicts four control tokens to regulate turn-switching and turn-keeping, distinguishing between intentional and unintentional barge-ins while detecting query completion for handling user pauses and hesitations. By processing input speech in short intervals, the semantic VAD enables real-time decision-making, while the core dialogue engine (CDE) is only activated for response generation, reducing computational overhead. This design allows independent DM optimization without retraining the CDE, balancing interaction accuracy and inference efficiency for scalable, next-generation full-duplex SDS.

View on arXiv PDF

Similar