CLSep 19, 2024

Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models

Peiyi Zhang, Yazhou Zhang, Bo Wang, Lu Rong, Prayag Tiwari, Jing Qin

arXiv:2409.12739v36 citationsh-index: 28

Originality Synthesis-oriented

AI Analysis

This work addresses the need for evaluating LLMs on Chinese education values, which is important for educators and developers in China, but it is incremental as it introduces a new benchmark rather than a novel method.

The authors tackled the problem of evaluating large language models (LLMs) on Chinese education values by creating Edu-Values, a benchmark with 1,418 questions across seven core values, and found that Chinese LLMs like Qwen 2 outperformed English LLMs with a score of 81.37, and using the benchmark for RAG improved alignment.

In this paper, we present Edu-Values, the first Chinese education values evaluation benchmark that includes seven core values: professional philosophy, teachers' professional ethics, education laws and regulations, cultural literacy, educational knowledge and skills, basic competencies and subject knowledge. We meticulously design 1,418 questions, covering multiple-choice, multi-modal question answering, subjective analysis, adversarial prompts, and Chinese traditional culture (short answer) questions. We conduct human feedback based automatic evaluation over 21 state-of-the-art (SoTA) LLMs, and highlight three main findings: (1) due to differences in educational culture, Chinese LLMs outperform English LLMs, with Qwen 2 ranking the first with a score of 81.37; (2) LLMs often struggle with teachers' professional ethics and professional philosophy; (3) leveraging Edu-Values to build an external knowledge repository for RAG significantly improves LLMs' alignment. This demonstrates the effectiveness of the proposed benchmark.

View on arXiv PDF

Similar