Tanya Nazaretsky

HC
h-index34
9papers
51citations
Novelty39%
AI Score52

9 Papers

CLOct 30, 2025Code
SCRIBE: Structured Chain Reasoning for Interactive Behaviour Explanations using Tool Calling

Fares Fawzi, Vinitra Swamy, Dominik Glandorf et al.

Language models can be used to provide interactive, personalized student feedback in educational settings. However, real-world deployment faces three key challenges: privacy concerns, limited computational resources, and the need for pedagogically valid responses. These constraints require small, open-source models that can run locally and reliably ground their outputs in correct information. We introduce SCRIBE, a framework for multi-hop, tool-augmented reasoning designed to generate valid responses to student questions about feedback reports. SCRIBE combines domain-specific tools with a self-reflective inference pipeline that supports iterative reasoning, tool use, and error recovery. We distil these capabilities into 3B and 8B models via two-stage LoRA fine-tuning on synthetic GPT-4o-generated data. Evaluation with a human-aligned GPT-Judge and a user study with 108 students shows that 8B-SCRIBE models achieve comparable or superior quality to much larger models in key dimensions such as relevance and actionability, while being perceived on par with GPT-4o and Llama-3.3 70B by students. These findings demonstrate the viability of SCRIBE for low-resource, privacy-sensitive educational applications.

69.8AIMar 31Code
REFINE: Real-world Exploration of Interactive Feedback and Student Behaviour

Fares Fawzi, Seyed Parsa Neshaei, Marta Knezevic et al.

Formative feedback is central to effective learning, yet providing timely, individualised feedback at scale remains a persistent challenge. While recent work has explored the use of large language models (LLMs) to automate feedback, most existing systems still conceptualise feedback as a static, one-way artifact, offering limited support for interpretation, clarification, or follow-up. In this work, we introduce REFINE, a locally deployable, multi-agent feedback system built on small, open-source LLMs that treats feedback as an interactive process. REFINE combines a pedagogically-grounded feedback generation agent with an LLM-as-a-judge-guided regeneration loop using a human-aligned judge, and a self-reflective tool-calling interactive agent that supports student follow-up questions with context-aware, actionable responses. We evaluate REFINE through controlled experiments and an authentic classroom deployment in an undergraduate computer science course. Automatic evaluations show that judge-guided regeneration significantly improves feedback quality, and that the interactive agent produces efficient, high-quality responses comparable to a state-of-the-art closed-source model. Analysis of real student interactions further reveals distinct engagement patterns and indicates that system-generated feedback systematically steers subsequent student inquiry. Our findings demonstrate the feasibility and effectiveness of multi-agent, tool-augmented feedback systems for scalable, interactive feedback.

39.3HCApr 10
Structuring versus Problematizing: How LLM-based Agents Scaffold Learning in Diagnostic Reasoning

Fatma Betül Güreş, Tanya Nazaretsky, Seyed Parsa Neshaei et al.

Supporting students in developing diagnostic reasoning is a key challenge across educational domains. Novices often face cognitive biases such as premature closure and over-reliance on heuristics, and they struggle to transfer diagnostic strategies to new cases. Scenario-based learning (SBL) enhanced by Learning Analytics (LA) and large language models (LLM) offers a promising approach by combining realistic case experiences with personalized scaffolding. Yet, how different scaffolding approaches shape reasoning processes remains insufficiently explored. This study introduces PharmaSim Switch, an SBL environment for pharmacy technician training, extended with an LA- and LLM-powered pharmacist agent that implements pedagogical conversations rooted in two theory-driven scaffolding approaches: \emph{structuring} and \emph{problematizing}, as well as a student learning trajectory. In a between-groups experiment, 63 vocational students completed a learning scenario, a near-transfer scenario, and a far-transfer scenario under one of the two scaffolding conditions. Results indicate that both scaffolding approaches were effective in supporting the use of diagnostic strategies. Performance outcomes were primarily influenced by scenario complexity rather than students' prior knowledge or the scaffolding approach used. The structuring approach was associated with more accurate Active and Interactive participation, whereas problematizing elicited more Constructive engagement. These findings underscore the value of combining scaffolding approaches when designing LA- and LLM-based systems to effectively foster diagnostic reasoning.

20.6HCMay 6
Tailoring Scaffolding to Diagnostic Strategies: Theory-Informed LLM-Based Agents

Fatma Betul Gures, Tanya Nazaretsky, Tanja Kaser

Learning analytics systems increasingly integrate large language models (LLMs) to provide adaptive scaffolding in complex learning environments, yet personalization is often driven by global instructional choices rather than principled alignment with learning theory, limiting effectiveness and pedagogical grounding. In prior work, we examined how structuring and problematizing scaffolding approaches can be instantiated through LLM agents in a scenario-based learning environment for diagnostic reasoning. While both approaches supported learning, we observed systematic differences in learner interaction patterns and clear tendencies indicating that different diagnostic strategies benefited from distinct forms of scaffolding. Building on these findings, we propose a theory-informed scaffolding design grounded in the Knowledge Learning Instruction (KLI) framework, as different diagnostic strategies target different types of knowledge and require different instructional mechanisms. We use KLI to guide the alignment between strategy demands and scaffolding approaches and introduce a KLI-informed hybrid LLM agent that adapts its pedagogical support according to the diagnostic strategy being practiced, rather than applying a single global scaffolding approach. We hypothesize that this design could enable better learning gains.

IRDec 11, 2023
Finding Paths for Explainable MOOC Recommendation: A Learner Perspective

Jibril Frej, Neel Shah, Marta Knežević et al.

The increasing availability of Massive Open Online Courses (MOOCs) has created a necessity for personalized course recommendation systems. These systems often combine neural networks with Knowledge Graphs (KGs) to achieve richer representations of learners and courses. While these enriched representations allow more accurate and personalized recommendations, explainability remains a significant challenge which is especially problematic for certain domains with significant impact such as education and online learning. Recently, a novel class of recommender systems that uses reinforcement learning and graph reasoning over KGs has been proposed to generate explainable recommendations in the form of paths over a KG. Despite their accuracy and interpretability on e-commerce datasets, these approaches have scarcely been applied to the educational domain and their use in practice has not been studied. In this work, we propose an explainable recommendation system for MOOCs that uses graph reasoning. To validate the practical implications of our approach, we conducted a user study examining user perceptions of our new explainable recommendations. We demonstrate the generalizability of our approach by conducting experiments on two educational datasets: COCO and Xuetang.

HCApr 15, 2025
Evaluating Trust in AI, Human, and Co-produced Feedback Among Undergraduate Students

Audrey Zhang, Yifei Gao, Wannapon Suraworachet et al.

As generative AI models, particularly large language models (LLMs), transform educational feedback practices in higher education (HE) contexts, understanding students' perceptions of different sources of feedback becomes crucial for their effective implementation and adoption. This study addresses a critical gap by comparing undergraduate students' trust in LLM, human, and human-AI co-produced feedback in their authentic HE context. More specifically, through a within-subject experimental design involving 91 participants, we investigated factors that predict students' ability to distinguish between feedback types, their perceptions of feedback quality, and potential biases related to the source of feedback. Findings revealed that when the source was blinded, students generally preferred AI and co-produced feedback over human feedback regarding perceived usefulness and objectivity. However, they presented a strong bias against AI when the source of feedback was disclosed. In addition, only AI feedback suffered a decline in perceived genuineness when feedback sources were revealed, while co-produced feedback maintained its positive perception. Educational AI experience improved students' ability to identify LLM-generated feedback and increased their trust in all types of feedback. More years of students' experience using AI for general purposes were associated with lower perceived usefulness and credibility of feedback. These insights offer substantial evidence of the importance of source credibility and the need to enhance both feedback literacy and AI literacy to mitigate bias in student perceptions for AI-generated feedback to be adopted and impact education.

65.6CYMar 12
The Future of Feedback: How Can AI Help Transform Feedback to Be More Engaging, Effective, and Scalable?

Jennifer Meyer, Olaf Köller, Thorben Jansen et al.

With digital learning environments becoming more prevalent, the ease with which generative AI enables the scalable production of real-time, automated feedback holds the potential to reshape learning and teaching experiences. This meeting report synthesizes the interdisciplinary perspectives of 50 scholars from educational psychology, computer science, science education, and the learning sciences on the use of generative AI for feedback and its promises and risks in educational practice. We highlight points of convergence in the scholarship, identify areas of debate and unresolved challenges, and outline open questions and future directions for research and educational practice that emerged from structured small-group activities designed to bridge disciplinary barriers.

CYMay 8, 2025
How Instructional Sequence and Personalized Support Impact Diagnostic Strategy Learning

Fatma Betül Güreş, Tanya Nazaretsky, Bahar Radmehr et al.

Supporting students in developing effective diagnostic reasoning is a key challenge in various educational domains. Novices often struggle with cognitive biases such as premature closure and over-reliance on heuristics. Scenario-based learning (SBL) can address these challenges by offering realistic case experiences and iterative practice, but the optimal sequencing of instruction and problem-solving activities remains unclear. This study examines how personalized support can be incorporated into different instructional sequences and whether providing explicit diagnostic strategy instruction before (I-PS) or after problem-solving (PS-I) improves learning and its transfer. We employ a between-groups design in an online SBL environment called PharmaSim, which simulates real-world client interactions for pharmacy technician apprentices. Results indicate that while both instruction types are beneficial, PS-I leads to significantly higher performance in transfer tasks.

AIDec 20, 2018
Kappa Learning: A New Method for Measuring Similarity Between Educational Items Using Performance Data

Tanya Nazaretsky, Sara Hershkovitz, Giora Alexandron

Sequencing items in adaptive learning systems typically relies on a large pool of interactive assessment items (questions) that are analyzed into a hierarchy of skills or Knowledge Components (KCs). Educational data mining techniques can be used to analyze students performance data in order to optimize the mapping of items to KCs. Standard methods that map items into KCs using item-similarity measures make the implicit assumption that students performance on items that depend on the same skill should be similar. This assumption holds if the latent trait (mastery of the underlying skill) is relatively fixed during students activity, as in the context of testing, which is the primary context in which these measures were developed and applied. However, in adaptive learning systems that aim for learning, and address subject matters such as K6 Math that consist of multiple sub-skills, this assumption does not hold. In this paper we propose a new item-similarity measure, termed Kappa Learning (KL), which aims to address this gap. KL identifies similarity between items under the assumption of learning, namely, that learners mastery of the underlying skills changes as they progress through the items. We evaluate Kappa Learning on data from a computerized tutor that teaches Fractions for 4th grade, with experts tagging as ground truth, and on simulated data. Our results show that clustering that is based on Kappa Learning outperforms clustering that is based on commonly used similarity measures (Cohen Kappa, Yule, and Pearson).