AGACCI : Affiliated Grading Agents for Criteria-Centric Interface in Educational Coding Contexts
This addresses the need for scalable and context-aware educational evaluation in coding contexts, though it appears incremental as an improvement over existing VLM-based approaches.
The paper tackled the problem of AI-assisted evaluation of complex educational artifacts like programming tasks, introducing AGACCI, a multi-agent system that outperformed a GPT-based baseline in accuracy, relevance, consistency, and coherence on a dataset of 360 graduate-level code assignments.
Recent advances in AI-assisted education have encouraged the integration of vision-language models (VLMs) into academic assessment, particularly for tasks that require both quantitative and qualitative evaluation. However, existing VLM based approaches struggle with complex educational artifacts, such as programming tasks with executable components and measurable outputs, that require structured reasoning and alignment with clearly defined evaluation criteria. We introduce AGACCI, a multi-agent system that distributes specialized evaluation roles across collaborative agents to improve accuracy, interpretability, and consistency in code-oriented assessment. To evaluate the framework, we collected 360 graduate-level code-based assignments from 60 participants, each annotated by domain experts with binary rubric scores and qualitative feedback. Experimental results demonstrate that AGACCI outperforms a single GPT-based baseline in terms of rubric and feedback accuracy, relevance, consistency, and coherence, while preserving the instructional intent and evaluative depth of expert assessments. Although performance varies across task types, AGACCI highlights the potential of multi-agent systems for scalable and context-aware educational evaluation.