CLSDDec 29, 2025

Style Amnesia: Investigating Speaking Style Degradation and Mitigation in Multi-Turn Spoken Language Models

arXiv:2512.23578v21 citationsh-index: 7Has Code
Originality Incremental advance
AI Analysis

This addresses a key limitation in conversational AI for applications requiring consistent style, though it is incremental as it focuses on diagnosing and mitigating an existing problem.

The paper investigates the problem of style amnesia in spoken language models (SLMs), where models fail to maintain instructed speaking styles like emotion or accent over multi-turn conversations, and finds that explicit recall instructions can partially mitigate this issue.

In this paper, we show that when spoken language models (SLMs) are instructed to speak in a specific speaking style at the beginning of a multi-turn conversation, they cannot maintain the required speaking styles after several turns of interaction; we refer to this as the style amnesia of SLMs. We focus on paralinguistic speaking styles, including emotion, accent, volume, and speaking speed. We evaluate three proprietary and two open-source SLMs, demonstrating that none of these models can maintain a consistent speaking style when instructed to do so. We further show that when SLMs are asked to recall the style instruction in later turns, they can recall the style instruction, but they fail to express it throughout the conversation. We also show that explicitly asking the model to recall the style instruction can partially mitigate style amnesia. In addition, we examine various prompting strategies and find that SLMs struggle to follow the required style when the instruction is placed in system messages rather than user messages, which contradicts the intended function of system prompts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes