CL AIOct 9, 2025

Towards Human-Like Grading: A Unified LLM-Enhanced Framework for Subjective Question Evaluation

Fanwei Zhua, Jiaxuan He, Xiaoxiao Chen, Zulong Chen, Quan Lu, Chenrui Mei

arXiv:2510.07912v12.7h-index: 8ECAI

Originality Incremental advance

AI Analysis

This addresses the problem of scalable and accurate grading for educators and institutions, though it is incremental as it builds on existing LLM capabilities.

The paper tackles the challenge of automatically grading diverse subjective questions by proposing a unified LLM-enhanced framework that integrates four modules for holistic evaluation, achieving consistent outperformance over baselines across multiple metrics and successful real-world deployment in e-commerce exams.

Automatic grading of subjective questions remains a significant challenge in examination assessment due to the diversity in question formats and the open-ended nature of student responses. Existing works primarily focus on a specific type of subjective question and lack the generality to support comprehensive exams that contain diverse question types. In this paper, we propose a unified Large Language Model (LLM)-enhanced auto-grading framework that provides human-like evaluation for all types of subjective questions across various domains. Our framework integrates four complementary modules to holistically evaluate student answers. In addition to a basic text matching module that provides a foundational assessment of content similarity, we leverage the powerful reasoning and generative capabilities of LLMs to: (1) compare key knowledge points extracted from both student and reference answers, (2) generate a pseudo-question from the student answer to assess its relevance to the original question, and (3) simulate human evaluation by identifying content-related and non-content strengths and weaknesses. Extensive experiments on both general-purpose and domain-specific datasets show that our framework consistently outperforms traditional and LLM-based baselines across multiple grading metrics. Moreover, the proposed system has been successfully deployed in real-world training and certification exams at a major e-commerce enterprise.

View on arXiv PDF

Similar