CLAIJan 12

Can Large Language Models Understand, Reason About, and Generate Code-Switched Text?

arXiv:2601.07153v10.273 citationsh-index: 19Has Code
AI Analysis25

This addresses the robustness of multilingual LLMs for users in code-switching contexts, representing an incremental evaluation study.

The paper tackled the problem of evaluating large language models' capabilities with code-switched text, introducing the CodeMixQA benchmark and finding persistent challenges in reasoning and generation tasks.

Code-switching is a pervasive phenomenon in multilingual communication, yet the robustness of large language models (LLMs) in mixed-language settings remains insufficiently understood. In this work, we present a comprehensive evaluation of LLM capabilities in understanding, reasoning over, and generating code-switched text. We introduce CodeMixQA a novel benchmark with high-quality human annotations, comprising 16 diverse parallel code-switched language-pair variants that span multiple geographic regions and code-switching patterns, and include both original scripts and their transliterated forms. Using this benchmark, we analyze the reasoning behavior of LLMs on code-switched question-answering tasks, shedding light on how models process and reason over mixed-language inputs. We further conduct a systematic evaluation of LLM-generated synthetic code-switched text, focusing on both naturalness and semantic fidelity, and uncover key limitations in current generation capabilities. Our findings reveal persistent challenges in both reasoning and generation under code-switching conditions and provide actionable insights for building more robust multilingual LLMs. We release the dataset and code as open source.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes