AIJul 18, 2025

Buggy rule diagnosis for combined steps through final answer evaluation in stepwise tasks

arXiv:2507.13651v1h-index: 2AIED
Originality Incremental advance
AI Analysis

This work addresses error diagnosis in intelligent tutoring systems for stepwise tasks, offering a practical solution to a specific bottleneck in educational technology.

The study tackled the problem of diagnosing errors in stepwise tasks when students combine multiple steps, which leads to a combinatorial explosion of possible solution paths. By using final answer evaluation, the approach diagnosed 29.4% of previously undiagnosed steps in a dataset of quadratic equation solutions, with diagnoses aligning with teacher assessments in 97% of cases.

Many intelligent tutoring systems can support a student in solving a stepwise task. When a student combines several steps in one step, the number of possible paths connecting consecutive inputs may be very large. This combinatorial explosion makes error diagnosis hard. Using a final answer to diagnose a combination of steps can mitigate the combinatorial explosion, because there are generally fewer possible (erroneous) final answers than (erroneous) solution paths. An intermediate input for a task can be diagnosed by automatically completing it according to the task solution strategy and diagnosing this solution. This study explores the potential of automated error diagnosis based on a final answer. We investigate the design of a service that provides a buggy rule diagnosis when a student combines several steps. To validate the approach, we apply the service to an existing dataset (n=1939) of unique student steps when solving quadratic equations, which could not be diagnosed by a buggy rule service that tries to connect consecutive inputs with a single rule. Results show that final answer evaluation can diagnose 29,4% of these steps. Moreover, a comparison of the generated diagnoses with teacher diagnoses on a subset (n=115) shows that the diagnoses align in 97% of the cases. These results can be considered a basis for further exploration of the approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes