CLMay 28

Learnable Assessment Skills for LLM-based Automated Scoring: Rubric Construction via Iterative Optimization

Yun Wang, Xin Xia, Xuansheng Wu, Xiaoming Zhai, Ninghao Liu

arXiv:2605.2927467.7h-index: 17

AI Analysis

This work addresses the scalability bottleneck in LLM-based automated scoring by reducing reliance on human expertise for rubric construction, offering a practical solution for deploying scoring systems across diverse tasks.

LLM-based automated scoring faces a bottleneck in scaling to new tasks due to the need for per-item human rubric construction. The authors propose an iterative framework that learns assessment skills—item-independent procedural knowledge—from scoring experience, achieving substantial improvements on all ten ASAP-SAS items and often surpassing expert rubrics.

LLM-based automated scoring approaches near-human performance, but scaling to new tasks remains bottlenecked by the per-item human configuration of upstream stages such as rubric construction. Human experts bypass this bottleneck through evaluation heuristics developed over extensive practice. We ask whether LLMs can learn similar heuristics directly from scoring experience, and formalize this as the concept of assessment skills: item-independent natural-language procedural knowledge that guides LLMs through specific stages of the scoring workflow. Focusing on rubric construction as a first instantiation, we propose an iterative framework that decomposes a skill into a fixed scaffold and learnable item-agnostic rules, refining the rules through LLM-driven diagnosis of scoring errors and validation-gated selection. The framework requires no expert-written rubric. On all ten ASAP-SAS items, optimized skills substantially improve LLM-based scoring and frequently surpass the dataset-provided expert rubric. Cross-item transfer experiments further reveal that learned skills capture both generalizable and item-specific patterns.

View on arXiv PDF

Similar