CLMar 8

StyleBench: Evaluating Speech Language Models on Conversational Speaking Style Control

Haishu Zhao, Aokai Hao, Yuan Ge, Zhenqiang Hong, Tong Xiao, Jingbo Zhu

arXiv:2603.07599v12 citations

Predicted impact top 88% in CL · last 90 daysOriginality Incremental advance

AI Analysis

This benchmark addresses the lack of systematic evaluation for conversational speaking style control in SLMs, which is a problem for researchers and developers aiming to create more realistic and customized interactive AI experiences.

This paper introduces StyleBench, a multi-turn dialogue benchmark designed to evaluate the ability of Speech Language Models (SLMs) to control speaking style intensity across four dimensions: emotion, speed, volume, and pitch. The evaluation reveals performance gaps between leading SLMs and omni language models (OLMs).

Speech language models (SLMs) have significantly extended the interactive capability of text-based Large Language Models (LLMs) by incorporating paralinguistic information. For more realistic interactive experience with customized styles, current SLMs have managed to interpret and control speaking style intensity from user prompts during the dialogue process. However, there remains a lack of systematic benchmarks that quantifies and evaluates the style intensity control ability in conversations. In this paper, we propose StyleBench, a multi-turn dialogue benchmark for comprehensively evaluating the style intensity control ability across four dimensions: emotion, speed, volume, and pitch. Our results reveal the performance gaps between leading SLMs and omni language models (OLMs), suggesting the underlying reasons and promising approaches for future exploration.

View on arXiv PDF

Similar