CLMay 22, 2025

Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models

Ercong Nie, Helmut Schmid, Hinrich Schütze

arXiv:2505.16538v213.08 citationsh-index: 13Has CodeEMNLP

Originality Incremental advance

AI Analysis

This addresses a critical challenge for users of multilingual AI systems, offering a novel intervention method, though it is incremental in building on existing interpretability techniques.

The study tackled the problem of language confusion in English-centric large language models by using mechanistic interpretability to identify and edit critical neurons, resulting in substantial mitigation of confusion while preserving general competence and fluency.

Language confusion -- where large language models (LLMs) generate unintended languages against the user's need -- remains a critical challenge, especially for English-centric models. We present the first mechanistic interpretability (MI) study of language confusion, combining behavioral benchmarking with neuron-level analysis. Using the Language Confusion Benchmark (LCB), we show that confusion points (CPs) -- specific positions where language switches occur -- are central to this phenomenon. Through layer-wise analysis with TunedLens and targeted neuron attribution, we reveal that transition failures in the final layers drive confusion. We further demonstrate that editing a small set of critical neurons, identified via comparative analysis with a multilingual-tuned counterpart, substantially mitigates confusion while largely preserving general competence and fluency. Our approach matches multilingual alignment in confusion reduction for many languages and yields cleaner, higher-quality outputs. These findings provide new insights into the internal dynamics of LLMs and highlight neuron-level interventions as a promising direction for robust, interpretable multilingual language modeling. Code and data are available at: https://github.com/ercong21/lang_confusion.

View on arXiv PDF Code

Similar