CLApr 3, 2025

CoTAL: Human-in-the-Loop Prompt Engineering for Generalizable Formative Assessment Scoring

Clayton Cohn, Ashwin T S, Naveeduddin Mohammed, Gautam Biswas

arXiv:2504.02323v34.93 citationsh-index: 8

Originality Incremental advance

AI Analysis

This addresses the challenge for teachers and students in automating and improving assessment scoring, though it appears incremental as it builds on existing prompt engineering methods.

The paper tackles the problem of generalizing LLM-based formative assessment scoring across domains like science and engineering by introducing CoTAL, which combines evidence-centered design, human-in-the-loop prompt engineering, and chain-of-thought prompting, achieving up to 38.9% improvement over a baseline.

Large language models (LLMs) have created new opportunities to assist teachers and support student learning. While researchers have explored various prompt engineering approaches in educational contexts, the degree to which these approaches generalize across domains--such as science, computing, and engineering--remains underexplored. In this paper, we introduce Chain-of-Thought Prompting + Active Learning (CoTAL), an LLM-based approach to formative assessment scoring that (1) leverages Evidence-Centered Design (ECD) to align assessments and rubrics with curriculum goals, (2) applies human-in-the-loop prompt engineering to automate response scoring, and (3) incorporates chain-of-thought (CoT) prompting and teacher and student feedback to iteratively refine questions, rubrics, and LLM prompts. Our findings demonstrate that CoTAL improves GPT-4's scoring performance across domains, achieving gains of up to 38.9% over a non-prompt-engineered baseline (i.e., without labeled examples, chain-of-thought prompting, or iterative refinement). Teachers and students judge CoTAL to be effective at scoring and explaining responses, and their feedback produces valuable insights that enhance grading accuracy and explanation quality.

View on arXiv PDF

Similar