CLOct 20, 2025

Language Confusion Gate: Language-Aware Decoding Through Model Self-Distillation

arXiv:2510.17555v12 citationsh-index: 17Has Code
Originality Incremental advance
AI Analysis

This addresses a specific issue in multilingual text generation for users of LLMs, offering an incremental improvement over existing methods.

The paper tackles the problem of language confusion in large language models, where unintended language mixing occurs during text generation, by introducing the Language Confusion Gate (LCG), a plug-in solution that reduces confusion by an order of magnitude without harming task performance.

Large language models (LLMs) often experience language confusion, which is the unintended mixing of languages during text generation. Current solutions to this problem either necessitate model retraining or cannot differentiate between harmful confusion and acceptable code-switching. This paper introduces the Language Confusion Gate (LCG), a lightweight, plug-in solution that filters tokens during decoding without altering the base LLM. The LCG is trained using norm-adjusted self-distillation to predict appropriate language families and apply masking only when needed. Our method is based on the findings that language confusion is infrequent, correct-language tokens are usually among the top predictions, and output token embedding norms are larger for high-resource languages, which biases sampling. When evaluated across various models, including Qwen3, GPT-OSS, Gemma3, Llama3.1, LCG decreases language confusion significantly, often by an order of magnitude, without negatively impacting task performance. Code is available at https://github.com/collinzrj/language_confusion_gate.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes