CLAIASNov 13, 2025

MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models

arXiv:2511.10262v15 citationsh-index: 4
Originality Synthesis-oriented
AI Analysis

This addresses the problem of inadequate evaluation tools for researchers and developers working on real-time conversational AI, though it is incremental as it builds on existing benchmarking efforts.

The paper tackles the lack of benchmarks for evaluating multi-round conversations in Full-Duplex Speech Language Models (FD-SLMs), introducing MTR-DuplexBench to assess dialogue quality, dynamics, instruction following, and safety, with results showing current FD-SLMs struggle with consistency across rounds.

Full-Duplex Speech Language Models (FD-SLMs) enable real-time, overlapping conversational interactions, offering a more dynamic user experience compared to traditional half-duplex models. However, existing benchmarks primarily focus on evaluating single-round interactions and conversational features, neglecting the complexities of multi-round communication and critical capabilities such as instruction following and safety. Evaluating FD-SLMs in multi-round settings poses significant challenges, including blurred turn boundaries in communication and context inconsistency during model inference. To address these gaps, we introduce MTR-DuplexBench, a novel benchmark that segments continuous full-duplex dialogues into discrete turns, enabling comprehensive, turn-by-turn evaluation of FD-SLMs across dialogue quality, conversational dynamics, instruction following, and safety. Experimental results reveal that current FD-SLMs face difficulties in maintaining consistent performance across multiple rounds and evaluation dimensions, highlighting the necessity and effectiveness of our proposed benchmark. The benchmark and code will be available in the future.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes