CLAIJan 2

ChiEngMixBench: Evaluating Large Language Models on Spontaneous and Natural Chinese-English Code-Mixed Generation

arXiv:2601.16217v1h-index: 5
Originality Incremental advance
AI Analysis

This addresses the need for better evaluation of code-mixing in AI for multilingual communities, though it is incremental as it builds on existing theories like MLF.

The paper tackles the problem of evaluating large language models on Chinese-English code-mixed generation by introducing ChiEngMixBench, a benchmark that assesses spontaneity and naturalness, and shows that the metrics systematically distinguish model performance while uncovering an emergent terminology layering strategy.

Code-mixing is increasingly prevalent in interactions between humans and large language models, yet existing work often reduces it to a translation or convertibility problem, making it difficult to assess whether a model's switching behavior is context-appropriate and aligned with human conventions. We introduce ChiEngMixBench, the first benchmark designed to evaluate code-mixing ability in authentic community contexts, built upon a general construction pipeline that enables scalable dataset development across domains and bilingual pairs. ChiEngMixBench formulates code-mixing as a cognitive alignment problem, characterized by two complementary signals: Spontaneity and Naturalness. Empirical evaluation shows that our metrics can systematically distinguish code-mixing performance across models. Beyond benchmarking, we further uncover an implicitly emergent Terminology Layering Strategy, a phenomenon consistent with the Matrix Language Frame (MLF) theory, indicating structured cognitive alignment between multilingual large language models and human communication.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes