AIFeb 6

BEAGLE: Behavior-Enforced Agent for Grounded Learner Emulation

Hanchen David Wang, Clayton Cohn, Zifan Xu, Siyuan Guo, Gautam Biswas, Meiyi Ma

arXiv:2602.13280v12.4h-index: 11

Originality Highly original

AI Analysis

This addresses the challenge of data scarcity for education research, such as training adaptive tutoring systems, by providing a method to generate realistic student simulations, though it is incremental in improving upon existing LLM-based approaches.

The paper tackled the problem of simulating authentic student learning behaviors in open-ended problem-solving environments by introducing BEAGLE, a neuro-symbolic framework that incorporates Self-Regulated Learning theory, resulting in synthetic traces that users could not distinguish from real student data in a Turing test (52.8% accuracy).

Simulating student learning behaviors in open-ended problem-solving environments holds potential for education research, from training adaptive tutoring systems to stress-testing pedagogical interventions. However, collecting authentic data is challenging due to privacy concerns and the high cost of longitudinal studies. While Large Language Models (LLMs) offer a promising path to student simulation, they suffer from competency bias, optimizing for efficient correctness rather than the erratic, iterative struggle characteristic of novice learners. We present BEAGLE, a neuro-symbolic framework that addresses this bias by incorporating Self-Regulated Learning (SRL) theory into a novel architecture. BEAGLE integrates three key technical innovations: (1) a semi-Markov model that governs the timing and transitions of cognitive behaviors and metacognitive behaviors; (2) Bayesian Knowledge Tracing with explicit flaw injection to enforce realistic knowledge gaps and "unknown unknowns"; and (3) a decoupled agent design that separates high-level strategy use from code generation actions to prevent the model from silently correcting its own intentional errors. In evaluations on Python programming tasks, BEAGLE significantly outperforms state-of-the-art baselines in reproducing authentic trajectories. In a human Turing test, users were unable to distinguish synthetic traces from real student data, achieving an accuracy indistinguishable from random guessing (52.8%).

View on arXiv PDF

Similar