CLFeb 18, 2025

KazMMLU: Evaluating Language Models on Kazakh, Russian, and Regional Knowledge of Kazakhstan

arXiv:2502.12829v211 citationsh-index: 47ACL
Originality Synthesis-oriented
AI Analysis

This addresses the lack of evaluation benchmarks for Kazakh language models, which is important for researchers and developers working on Kazakh-centric NLP, though it is incremental as it adapts an existing benchmark format to a new language context.

The authors tackled the underrepresentation of Kazakh language in NLP by creating KazMMLU, the first MMLU-style dataset with 23,000 questions in Kazakh and Russian, and found that state-of-the-art multilingual models perform poorly on it, with significant gaps compared to high-resource languages.

Despite having a population of twenty million, Kazakhstan's culture and language remain underrepresented in the field of natural language processing. Although large language models (LLMs) continue to advance worldwide, progress in Kazakh language has been limited, as seen in the scarcity of dedicated models and benchmark evaluations. To address this gap, we introduce KazMMLU, the first MMLU-style dataset specifically designed for Kazakh language. KazMMLU comprises 23,000 questions that cover various educational levels, including STEM, humanities, and social sciences, sourced from authentic educational materials and manually validated by native speakers and educators. The dataset includes 10,969 Kazakh questions and 12,031 Russian questions, reflecting Kazakhstan's bilingual education system and rich local context. Our evaluation of several state-of-the-art multilingual models (Llama-3.1, Qwen-2.5, GPT-4, and DeepSeek V3) demonstrates substantial room for improvement, as even the best-performing models struggle to achieve competitive performance in Kazakh and Russian. These findings underscore significant performance gaps compared to high-resource languages. We hope that our dataset will enable further research and development of Kazakh-centric LLMs. Data and code will be made available upon acceptance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes