CoDeTT: A Context-Aware Decision Benchmark for Turn-Taking Evaluation
This provides a standardized benchmark for systematic evaluation of turn-taking systems, addressing a domain-specific need in spoken dialogue research.
The authors tackled the problem of fragmented evaluation in turn-taking modeling for spoken dialogue systems by creating CoDeTT, a context-aware benchmark that formulates turn-taking as a structured decision problem. They observed substantial performance disparities across decision types and interaction scenarios when assessing existing models under this unified protocol.
Turn-taking modeling is fundamental to spoken dialogue systems, yet its evaluation remains fragmented and often limited to binary boundary detection under narrow interaction settings. Such protocols hinder systematic comparison and obscure model weaknesses across conversational conditions. We present CoDeTT, a context-aware decision benchmark for turn-taking evaluation. CoDeTT formulates turn-taking as a structured decision problem and constructs a multi-scenario dataset with fine-grained decision categories and controlled context variations. Under a unified evaluation protocol, we assess representative existing models and observe substantial performance disparities across decision types and interaction scenarios. CoDeTT provides a standardized benchmark for systematic and context-aware evaluation of turn-taking systems. The benchmark dataset and evaluation toolkit are available at https://github.com/YingaoWang-casia/CoDeTT.github.io.