Synchronization and Turn-Taking in Full-Duplex Speech Dialogue Models

Pablo Riera, Pablo Brusco, Cristina Kuo, Marcelo Sancinetti, S. R. K. Branavan

arXiv:2605.2035645.2

AI Analysis

For researchers building more natural spoken dialogue systems, this work provides initial evidence of neural coupling-like dynamics in full-duplex models, though the findings are preliminary and based on simulated interactions.

This paper studies how full-duplex spoken dialogue models synchronize their internal representations during interaction and encode anticipatory turn-taking cues. Under no noise, synchronization peaks near zero lag and degrades with noise; internal states predict turn-taking ahead of time.

Full-duplex spoken dialogue models (SDMs) can listen and speak simultaneously, enabling interaction dynamics closer to human conversation than turn-based systems. Inspired by neural coupling in human communication, we study how such models coordinate their internal representations during interaction. We simulate full-duplex dialogues between two instances of the pretrained \textit{Moshi} model under controlled conditions, manipulating channel noise and decoding bias. Synchronization is measured using Centered Kernel Alignment (CKA) across temporal lags, while anticipatory turn-taking cues are probed from delayed internal activations using causal LSTM models, from both speaker and listener perspectives. We find strong representational synchronization under no noise conditions, peaking near zero lag and degrading with noise, and we show that internal states encode anticipatory information that supports turn-taking prediction ahead of time.

View on arXiv PDF

Similar