Evaluating Multimodal Generative AI with Korean Educational Standards
This work addresses the need for benchmarks in non-English languages, specifically Korean, to evaluate AI in educational contexts, though it is incremental as it applies existing evaluation methods to new data.
The paper introduces KoNET, a benchmark using Korean national educational tests to evaluate multimodal generative AI systems across four exam levels, assessing models on difficulty, subject diversity, and human error rates to provide insights into performance in less-explored languages.
This paper presents the Korean National Educational Test Benchmark (KoNET), a new benchmark designed to evaluate Multimodal Generative AI Systems using Korean national educational tests. KoNET comprises four exams: the Korean Elementary General Educational Development Test (KoEGED), Middle (KoMGED), High (KoHGED), and College Scholastic Ability Test (KoCSAT). These exams are renowned for their rigorous standards and diverse questions, facilitating a comprehensive analysis of AI performance across different educational levels. By focusing on Korean, KoNET provides insights into model performance in less-explored languages. We assess a range of models - open-source, open-access, and closed APIs - by examining difficulties, subject diversity, and human error rates. The code and dataset builder will be made fully open-sourced at https://github.com/naver-ai/KoNET.