CLMar 3, 2025

MiLiC-Eval: Benchmarking Multilingual LLMs for China's Minority Languages

Chen Zhang, Mingxu Tao, Zhiyuan Liao, Yansong Feng

Peking U

arXiv:2503.01150v212.06 citationsh-index: 14Has CodeACL

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of evaluating and improving LLMs for underrepresented minority languages in China, such as Tibetan and Uyghur, which is incremental as it provides a new benchmark for existing methods.

The authors tackled the problem of large language models (LLMs) struggling with low-resource minority languages in China by introducing MiLiC-Eval, a benchmark with 24K instances across 9 tasks, which revealed that open-source LLMs perform poorly on syntax-intensive tasks and multi-script languages.

Large language models (LLMs) excel in high-resource languages but struggle with low-resource languages (LRLs), particularly those spoken by minority communities in China, such as Tibetan, Uyghur, Kazakh, and Mongolian. To systematically track the progress in these languages, we introduce MiLiC-Eval, a benchmark designed for minority languages in China, featuring 24K instances across 9 tasks. MiLiC-Eval focuses on underrepresented writing systems. Its parallelism between tasks and languages can provide a faithful and fine-grained assessment of linguistic and problem-solving skills. Our evaluation reveals that open-source LLMs perform poorly on syntax-intensive tasks and multi-script languages. We further demonstrate how MiLiC-Eval can help advance LRL research in handling diverse writing systems and understanding the process of language adaptation.

View on arXiv PDF Code

Similar