CLAIJul 11, 2025

From KMMLU-Redux to KMMLU-Pro: A Professional Korean Benchmark Suite for LLM Evaluation

AI2AnthropicBerkeleyDeepMindGeorgia TechMeta AI+3
arXiv:2507.08924v22 citationsh-index: 74EMNLP
Originality Synthesis-oriented
AI Analysis

This provides domain-specific benchmarks for evaluating Korean LLMs' professional knowledge, though it's incremental as it builds on existing KMMLU.

The authors introduced two Korean expert-level benchmarks - KMMLU-Redux (reconstructed from existing KMMLU with errors removed) and KMMLU-Pro (based on professional licensure exams) - to evaluate LLMs' applicability in real-world industrial scenarios, demonstrating they comprehensively represent Korean industrial knowledge.

The development of Large Language Models (LLMs) requires robust benchmarks that encompass not only academic domains but also industrial fields to effectively evaluate their applicability in real-world scenarios. In this paper, we introduce two Korean expert-level benchmarks. KMMLU-Redux, reconstructed from the existing KMMLU, consists of questions from the Korean National Technical Qualification exams, with critical errors removed to enhance reliability. KMMLU-Pro is based on Korean National Professional Licensure exams to reflect professional knowledge in Korea. Our experiments demonstrate that these benchmarks comprehensively represent industrial knowledge in Korea. We release our dataset publicly available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes