SECLFeb 11

ISD-Agent-Bench: A Comprehensive Benchmark for Evaluating LLM-based Instructional Design Agents

arXiv:2602.10620v1h-index: 9
Originality Incremental advance
AI Analysis

This work addresses the lack of standardized benchmarks for LLM-based instructional design agents, benefiting researchers in educational technology, though it is incremental as it builds on existing ISD theories and agent methods.

The authors tackled the challenge of evaluating LLM-based agents for Instructional Systems Design (ISD) by introducing ISD-Agent-Bench, a comprehensive benchmark with 25,795 scenarios, and found that integrating classical ISD frameworks with ReAct-style reasoning achieved the highest performance on 1,017 test scenarios, outperforming other approaches.

Large Language Model (LLM) agents have shown promising potential in automating Instructional Systems Design (ISD), a systematic approach to developing educational programs. However, evaluating these agents remains challenging due to the lack of standardized benchmarks and the risk of LLM-as-judge bias. We present ISD-Agent-Bench, a comprehensive benchmark comprising 25,795 scenarios generated via a Context Matrix framework that combines 51 contextual variables across 5 categories with 33 ISD sub-steps derived from the ADDIE model. To ensure evaluation reliability, we employ a multi-judge protocol using diverse LLMs from different providers, achieving high inter-judge reliability. We compare existing ISD agents with novel agents grounded in classical ISD theories such as ADDIE, Dick \& Carey, and Rapid Prototyping ISD. Experiments on 1,017 test scenarios demonstrate that integrating classical ISD frameworks with modern ReAct-style reasoning achieves the highest performance, outperforming both pure theory-based agents and technique-only approaches. Further analysis reveals that theoretical quality strongly correlates with benchmark performance, with theory-based agents showing significant advantages in problem-centered design and objective-assessment alignment. Our work provides a foundation for systematic LLM-based ISD research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes