CLApr 7, 2025

Enhancing LLM-Based Short Answer Grading with Retrieval-Augmented Generation

arXiv:2504.05276v29 citationsh-index: 10EDM
Originality Incremental advance
AI Analysis

This work addresses the challenge of automating short answer assessment in science education to reduce human grader workload, though it appears incremental as it builds on existing RAG methods.

The paper tackled the problem of LLMs' limited domain knowledge in short answer grading by proposing an adaptive retrieval-augmented generation framework, which improved grading accuracy compared to baseline LLM approaches in a science education dataset.

Short answer assessment is a vital component of science education, allowing evaluation of students' complex three-dimensional understanding. Large language models (LLMs) that possess human-like ability in linguistic tasks are increasingly popular in assisting human graders to reduce their workload. However, LLMs' limitations in domain knowledge restrict their understanding in task-specific requirements and hinder their ability to achieve satisfactory performance. Retrieval-augmented generation (RAG) emerges as a promising solution by enabling LLMs to access relevant domain-specific knowledge during assessment. In this work, we propose an adaptive RAG framework for automated grading that dynamically retrieves and incorporates domain-specific knowledge based on the question and student answer context. Our approach combines semantic search and curated educational sources to retrieve valuable reference materials. Experimental results in a science education dataset demonstrate that our system achieves an improvement in grading accuracy compared to baseline LLM approaches. The findings suggest that RAG-enhanced grading systems can serve as reliable support with efficient performance gains.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes