Lost in the Tower of Babel: The Adverse Effects of Incidental Multilingualism in LLMs
For NLP researchers and practitioners, the paper highlights a critical flaw in current multilingual LLM evaluation and deployment, calling for a paradigm shift toward intentional multilingual design.
The paper argues that current LLMs exhibit 'incidental multilingualism' due to training on uneven web data, leading to unequal and brittle behavior across languages. Empirical studies show discrepancies between self-reported and actual language support, and simple language-change attacks expose failures.
This paper argues that contemporary multilingual NLP has converged on a fragile and misleading paradigm of incidental multilingualism. Today's LLMs appear multilingual largely because they are trained on massive, uneven web corpora, not because multilingual or multicultural competence has been treated as a core design objective. We contend that this paradigm systematically produces unequal, brittle, and opaque behavior across languages, with severe consequences in real-world and agentic deployments where models must reason, plan, and act across multiple linguistic contexts. We report a focused empirical study of two practical questions: which languages models self-report as supported and which languages they actually respond in across multilingual prompts. We additionally demonstrate how even a simple language-change attack can surface these failures and expose hidden assumptions about language in LLM-based systems. To address this, we call for a shift toward multilingualism by design: a research agenda that treats equitable multilingual performance, cultural grounding, and cross-lingual behavioral understanding as first-class goals in all aspects of the model pipeline.