CLMay 20, 2025

Language Mixing in Reasoning Language Models: Patterns, Impact, and Internal Causes

arXiv:2505.14815v319 citationsh-index: 24EMNLP
Originality Incremental advance
AI Analysis

This work addresses the problem of inconsistent reasoning performance in multilingual settings for AI developers and researchers, offering incremental improvements through controlled language use.

The study systematically analyzed language mixing in reasoning language models, showing that forcing reasoning in Latin or Han scripts via constrained decoding improves accuracy, with concrete performance gains observed across 15 languages and varied tasks.

Reasoning language models (RLMs) excel at complex tasks by leveraging a chain-of-thought process to generate structured intermediate steps. However, language mixing, i.e., reasoning steps containing tokens from languages other than the prompt, has been observed in their outputs and shown to affect performance, though its impact remains debated. We present the first systematic study of language mixing in RLMs, examining its patterns, impact, and internal causes across 15 languages, 7 task difficulty levels, and 18 subject areas, and show how all three factors influence language mixing. Moreover, we demonstrate that the choice of reasoning language significantly affects performance: forcing models to reason in Latin or Han scripts via constrained decoding notably improves accuracy. Finally, we show that the script composition of reasoning traces closely aligns with that of the model's internal representations, indicating that language mixing reflects latent processing preferences in RLMs. Our findings provide actionable insights for optimizing multilingual reasoning and open new directions for controlling reasoning languages to build more interpretable and adaptable RLMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes