Roger Azevedo

AI
h-index82
7papers
53citations
Novelty42%
AI Score46

7 Papers

24.4HCMar 27
Unlocking Open-Player-Modeling-enhanced Game-Based Learning: The Open Player Socially Analytical Intelligence Architecture

Zhiyu Lin, Boyd Fox, Devon Mckee et al. · gatech

Game-Based Learning (GBL) is a learner-engaging pedagogical methodology, yet adapting games to heterogeneous learners requires transparent, real-time Open Player Models (OPMs). We contribute to the community Open Player Socially Analytical Intelligence (OPSAI), an architecture implementing OPM beyond conceptual frameworks and validated in a GBL application. It decouples gameplay telemetry and analysis from the game engine and automatically derives pedagogically actionable insights, supporting the transparency of computational player models while making them accessible to players. OPSAI comprises three logical layers: a Frontend that both provides the GBL experience and collects information needed for analytics; a stateless Backend that hosts transparent analytics services producing reflective prompts, recommendations, and visualization guides; and a two-tier Log Storage that balances heavy raw gameplay data with lightweight reference indices for low-latency queries. By feeding analytics outputs back into the game interface, OPSAI closes the feedback loop between play and learning, empowering teachers, researchers, and learners alike. We further showcase OPSAI with a full deployment on the Parallel GBL environment, featuring live play traces, peer comparisons, and personalized suggestions, demonstrating a reusable blueprint for future educational games.

AINov 1, 2023
Augmenting deep neural networks with symbolic knowledge: Towards trustworthy and interpretable AI for education

Danial Hooshyar, Roger Azevedo, Yeongwook Yang

Artificial neural networks (ANNs) have shown to be amongst the most important artificial intelligence (AI) techniques in educational applications, providing adaptive educational services. However, their educational potential is limited in practice due to three major challenges: i) difficulty in incorporating symbolic educational knowledge (e.g., causal relationships, and practitioners' knowledge) in their development, ii) learning and reflecting biases, and iii) lack of interpretability. Given the high-risk nature of education, the integration of educational knowledge into ANNs becomes crucial for developing AI applications that adhere to essential educational restrictions, and provide interpretability over the predictions. This research argues that the neural-symbolic family of AI has the potential to address the named challenges. To this end, it adapts a neural-symbolic AI framework and accordingly develops an approach called NSAI, that injects and extracts educational knowledge into and from deep neural networks, for modelling learners computational thinking. Our findings reveal that the NSAI approach has better generalizability compared to deep neural networks trained merely on training data, as well as training data augmented by SMOTE and autoencoder methods. More importantly, unlike the other models, the NSAI approach prioritises robust representations that capture causal relationships between input features and output labels, ensuring safety in learning to avoid spurious correlations and control biases in training data. Furthermore, the NSAI approach enables the extraction of rules from the learned network, facilitating interpretation and reasoning about the path to predictions, as well as refining the initial educational knowledge. These findings imply that neural-symbolic AI can overcome the limitations of ANNs in education, enabling trustworthy and interpretable applications.

AIDec 28, 2025
Problems With Large Language Models for Learner Modelling: Why LLMs Alone Fall Short for Responsible Tutoring in K--12 Education

Danial Hooshyar, Yeongwook Yang, Gustav Šíř et al.

The rapid rise of large language model (LLM)-based tutors in K--12 education has fostered a misconception that generative models can replace traditional learner modelling for adaptive instruction. This is especially problematic in K--12 settings, which the EU AI Act classifies as high-risk domain requiring responsible design. Motivated by these concerns, this study synthesises evidence on limitations of LLM-based tutors and empirically investigates one critical issue: the accuracy, reliability, and temporal coherence of assessing learners' evolving knowledge over time. We compare a deep knowledge tracing (DKT) model with a widely used LLM, evaluated zero-shot and fine-tuned, using a large open-access dataset. Results show that DKT achieves the highest discrimination performance (AUC = 0.83) on next-step correctness prediction and consistently outperforms the LLM across settings. Although fine-tuning improves the LLM's AUC by approximately 8\% over the zero-shot baseline, it remains 6\% below DKT and produces higher early-sequence errors, where incorrect predictions are most harmful for adaptive support. Temporal analyses further reveal that DKT maintains stable, directionally correct mastery updates, whereas LLM variants exhibit substantial temporal weaknesses, including inconsistent and wrong-direction updates. These limitations persist despite the fine-tuned LLM requiring nearly 198 hours of high-compute training, far exceeding the computational demands of DKT. Our qualitative analysis of multi-skill mastery estimation further shows that, even after fine-tuning, the LLM produced inconsistent mastery trajectories, while DKT maintained smooth and coherent updates. Overall, the findings suggest that LLMs alone are unlikely to match the effectiveness of established intelligent tutoring systems, and that responsible tutoring requires hybrid frameworks that incorporate learner modelling.

12.3AIApr 9
Neural-Symbolic Knowledge Tracing: Injecting Educational Knowledge into Deep Learning for Responsible Learner Modelling

Danial Hooshyar, Gustav Šír, Yeongwook Yang et al.

The growing use of artificial intelligence (AI) in education, particularly large language models (LLMs), has increased interest in intelligent tutoring systems. However, LLMs often show limited adaptivity and struggle to model learners' evolving knowledge over time, highlighting the need for dedicated learner modelling approaches. Although deep knowledge tracing methods achieve strong predictive performance, their opacity and susceptibility to bias can limit alignment with pedagogical principles. To address this, we propose Responsible-DKT, a neural-symbolic deep knowledge tracing approach that integrates symbolic educational knowledge (e.g., mastery and non-mastery rules) into sequential neural models for responsible learner modelling. Experiments on a real-world dataset of students' math interactions show that Responsible-DKT outperforms both a neural-symbolic baseline and a fully data-driven PyTorch DKT model across training settings. The model achieves over 0.80 AUC with only 10% of training data and up to 0.90 AUC, improving performance by up to 13%. It also demonstrates improved temporal reliability, producing lower early- and mid-sequence prediction errors and the lowest prediction inconsistency rates across sequence lengths, indicating that prediction updates remain directionally aligned with observed student responses over time. Furthermore, the neural-symbolic approach offers intrinsic interpretability via a grounded computation graph that exposes the logic behind each prediction, enabling both local and global explanations. It also allows empirical evaluation of pedagogical assumptions, revealing that repeated incorrect responses (non-mastery) strongly influence prediction updates. These results indicate that neural-symbolic approaches enhance both performance and interpretability, mitigate data limitations, and support more responsible, human-centered AI in education.

CYApr 22, 2025
Towards responsible AI for education: Hybrid human-AI to confront the Elephant in the room

Danial Hooshyar, Gustav Šír, Yeongwook Yang et al.

Despite significant advancements in AI-driven educational systems and ongoing calls for responsible AI for education, several critical issues remain unresolved -- acting as the elephant in the room within AI in education, learning analytics, educational data mining, learning sciences, and educational psychology communities. This critical analysis identifies and examines nine persistent challenges that continue to undermine the fairness, transparency, and effectiveness of current AI methods and applications in education. These include: (1) the lack of clarity around what AI for education truly means -- often ignoring the distinct purposes, strengths, and limitations of different AI families -- and the trend of equating it with domain-agnostic, company-driven large language models; (2) the widespread neglect of essential learning processes such as motivation, emotion, and (meta)cognition in AI-driven learner modelling and their contextual nature; (3) limited integration of domain knowledge and lack of stakeholder involvement in AI design and development; (4) continued use of non-sequential machine learning models on temporal educational data; (5) misuse of non-sequential metrics to evaluate sequential models; (6) use of unreliable explainable AI methods to provide explanations for black-box models; (7) ignoring ethical guidelines in addressing data inconsistencies during model training; (8) use of mainstream AI methods for pattern discovery and learning analytics without systematic benchmarking; and (9) overemphasis on global prescriptions while overlooking localised, student-specific recommendations. Supported by theoretical and empirical research, we demonstrate how hybrid AI methods -- specifically neural-symbolic AI -- can address the elephant in the room and serve as the foundation for responsible, trustworthy AI systems in education.

CLFeb 1
Large Language Models as Students Who Think Aloud: Overly Coherent, Verbose, and Confident

Conrad Borchers, Jill-Jênn Vie, Roger Azevedo

Large language models (LLMs) are increasingly embedded in AI-based tutoring systems. Can they faithfully model novice reasoning and metacognitive judgments? Existing evaluations emphasize problem-solving accuracy, overlooking the fragmented and imperfect reasoning that characterizes human learning. We evaluate LLMs as novices using 630 think-aloud utterances from multi-step chemistry tutoring problems with problem-solving logs of student hint use, attempts, and problem context. We compare LLM-generated reasoning to human learner utterances under minimal and extended contextual prompting, and assess the models' ability to predict step-level learner success. Although GPT-4.1 generates fluent and contextually appropriate continuations, its reasoning is systematically over-coherent, verbose, and less variable than human think-alouds. These effects intensify with a richer problem-solving context during prompting. Learner performance was consistently overestimated. These findings highlight epistemic limitations of simulating learning with LLMs. We attribute these limitations to LLM training data, including expert-like solutions devoid of expressions of affect and working memory constraints during problem solving. Our evaluation framework can guide future design of adaptive systems that more faithfully support novice learning and self-regulation using generative artificial intelligence.

AIApr 1, 2025
Towards Responsible and Trustworthy Educational Data Mining: Comparing Symbolic, Sub-Symbolic, and Neural-Symbolic AI Methods

Danial Hooshyar, Eve Kikas, Yeongwook Yang et al.

Given the demand for responsible and trustworthy AI for education, this study evaluates symbolic, sub-symbolic, and neural-symbolic AI (NSAI) in terms of generalizability and interpretability. Our extensive experiments on balanced and imbalanced self-regulated learning datasets of Estonian primary school students predicting 7th-grade mathematics national test performance showed that symbolic and sub-symbolic methods performed well on balanced data but struggled to identify low performers in imbalanced datasets. Interestingly, symbolic and sub-symbolic methods emphasized different factors in their decision-making: symbolic approaches primarily relied on cognitive and motivational factors, while sub-symbolic methods focused more on cognitive aspects, learnt knowledge, and the demographic variable of gender -- yet both largely overlooked metacognitive factors. The NSAI method, on the other hand, showed advantages by: (i) being more generalizable across both classes -- even in imbalanced datasets -- as its symbolic knowledge component compensated for the underrepresented class; and (ii) relying on a more integrated set of factors in its decision-making, including motivation, (meta)cognition, and learnt knowledge, thus offering a comprehensive and theoretically grounded interpretability framework. These contrasting findings highlight the need for a holistic comparison of AI methods before drawing conclusions based solely on predictive performance. They also underscore the potential of hybrid, human-centred NSAI methods to address the limitations of other AI families and move us closer to responsible AI for education. Specifically, by enabling stakeholders to contribute to AI design, NSAI aligns learned patterns with theoretical constructs, incorporates factors like motivation and metacognition, and strengthens the trustworthiness and responsibility of educational data mining.