Vincent Aleven

CY
h-index86
16papers
228citations
Novelty38%
AI Score51

16 Papers

CLJul 5, 2023
Comparative Analysis of GPT-4 and Human Graders in Evaluating Praise Given to Students in Synthetic Dialogues

Dollaya Hirunyasiri, Danielle R. Thomas, Jionghao Lin et al. · cmu

Research suggests that providing specific and timely feedback to human tutors enhances their performance. However, it presents challenges due to the time-consuming nature of assessing tutor performance by human evaluators. Large language models, such as the AI-chatbot ChatGPT, hold potential for offering constructive feedback to tutors in practical settings. Nevertheless, the accuracy of AI-generated feedback remains uncertain, with scant research investigating the ability of models like ChatGPT to deliver effective feedback. In this work-in-progress, we evaluate 30 dialogues generated by GPT-4 in a tutor-student setting. We use two different prompting approaches, the zero-shot chain of thought and the few-shot chain of thought, to identify specific components of effective praise based on five criteria. These approaches are then compared to the results of human graders for accuracy. Our goal is to assess the extent to which GPT-4 can accurately identify each praise criterion. We found that both zero-shot and few-shot chain of thought approaches yield comparable results. GPT-4 performs moderately well in identifying instances when the tutor offers specific and immediate praise. However, GPT-4 underperforms in identifying the tutor's ability to deliver sincere praise, particularly in the zero-shot prompting scenario where examples of sincere tutor praise statements were not provided. Future work will focus on enhancing prompt engineering, developing a more general tutoring rubric, and evaluating our method using real-life tutoring dialogues.

28.0CYApr 27
Coasting Through Class: Learning Opportunity Loss from Practice Avoidance During Individual Seatwork

Ashish Gurung, Jordan Gutterman, Danielle R. Thomas et al.

Measures of disengagement provide insights into unproductive use of learning opportunities. Although measures of active disengagement, such as gaming the system and mind-wandering, are well studied, loss of practice time due to outright task avoidance remains relatively understudied. The current study addresses this gap by extending existing within-task measures (idle time) with two new session-level measures (delayed start and early stop) to capture loss of practice time due to task avoidance. We characterize the combined lost time as coasted time and the associated behavior as coasting behavior. Using ASSISTments logs (N = 1,425), we find that students dedicate only 40% of available classwork time to math practice and coast through the remaining 60%. Of the coasted time, 36% resulted from delayed starts, 2% from mid-practice idling, and 62% from stopping early. Delayed start and early stop showed moderate temporal stability (G = 0.73 and 0.71, respectively), suggesting that coasting is a consistent behavioral pattern. Even after excluding early stops attributable to assignment completion (i.e., early stop = 0), coasted time remained substantial at 32%. While we observe significant differences in coasting by gender and IEP status, we do not observe them by other demographic factors or school locale. Critically, students who continued working beyond the first assignment completion ("extra effort") performed significantly better on standardized tests. For research, coasting offers a new lens on opportunity loss by combining session-level disengagement with within-task disengagement. For practitioners, our results highlight the need for platform affordances that support sustained engagement and more productive use of available practice time.

37.7HCMay 20
Simulating Learners' Task-Selection Strategies and System Constraints in Mastery Learning

Haley Noh, Aarna Chowdhary, Jeroen Ooge et al.

Intelligent Tutoring Systems often grant learners shared control over skill and problem selection. Prior work suggests learners exhibit diverse task-selection strategies, such as avoiding challenge, which may interact with mastery learning systems that optimize task selection based on estimated knowledge. Algorithmic constraints on problem selection may help mitigate these effects, but testing such constraints in classrooms is costly. We propose a simulation-based framework to examine how learner task-selection strategies and system constraints shape mastery learning efficiency. Using interaction data from 261 students across two mathematical domains (equation solving and graph interpretation), we simulate strategies such as Weakness Targeting and Interleaving. We evaluate how these strategies affect overpractice as a measure of efficiency. Results show substantial variability across strategies, with risk-averse strategies producing higher levels of overpractice, especially for complex multi-step problems. Targeted system constraints significantly reduce inefficiencies for maladaptive strategies while minimally affecting already efficient strategies. These findings show how simulation grounded in student data can guide the redesign of shared-control tutoring systems before classroom deployment.

HCDec 16, 2024Code
Combining Large Language Models with Tutoring System Intelligence: A Case Study in Caregiver Homework Support

Devika Venugopalan, Ziwen Yan, Conrad Borchers et al. · cmu

Caregivers (i.e., parents and members of a child's caring community) are underappreciated stakeholders in learning analytics. Although caregiver involvement can enhance student academic outcomes, many obstacles hinder involvement, most notably knowledge gaps with respect to modern school curricula. An emerging topic of interest in learning analytics is hybrid tutoring, which includes instructional and motivational support. Caregivers assert similar roles in homework, yet it is unknown how learning analytics can support them. Our past work with caregivers suggested that conversational support is a promising method of providing caregivers with the guidance needed to effectively support student learning. We developed a system that provides instructional support to caregivers through conversational recommendations generated by a Large Language Model (LLM). Addressing known instructional limitations of LLMs, we use instructional intelligence from tutoring systems while conducting prompt engineering experiments with the open-source Llama 3 LLM. This LLM generated message recommendations for caregivers supporting their child's math practice via chat. Few-shot prompting and combining real-time problem-solving context from tutoring systems with examples of tutoring practices yielded desirable message recommendations. These recommendations were evaluated with ten middle school caregivers, who valued recommendations facilitating content-level support and student metacognition through self-explanation. We contribute insights into how tutoring systems can best be merged with LLMs to support hybrid tutoring settings through conversational assistance, facilitating effective caregiver involvement in tutoring systems.

38.4HCMar 31
Evaluating a Data-Driven Redesign Process for Intelligent Tutoring Systems

Qianru Lyu, Conrad Borchers, Meng Xia et al.

Past research has defined a general process for the data-driven redesign of educational technologies and has shown that in carefully-selected instances, this process can help make systems more effective. In the current work, we test the generality of the approach by applying it to four units of a middle-school mathematics intelligent tutoring system that were selected not based on suitability for redesign, as in previous work, but on topic. We tested whether the redesigned system was more effective than the original in a classroom study with 123 students. Although the learning gains did not differ between the conditions, students who used the Redesigned Tutor had more productive time-on-task, a larger number of skills practiced, and greater total knowledge mastery. The findings highlight the promise of data-driven redesign even when applied to instructional units *not* selected as likely to yield improvement, as evidence of the generality and wide applicability of the method.

29.6LGMay 12
From Heuristics to Analytics: Forecasting Effort and Progress in Online Learning

Eric S. Qiu, Danielle R. Thomas, Boyuan Guo et al.

Sustained effort is essential for realizing the benefits of intelligent tutoring systems (ITS), yet many learners disengage or underuse available practice time. We introduce engagement forecasting as a supervised prediction task based on ITS logs, targeting two outcomes central to effort and learning progress: minutes practiced per week and new skills mastered per week. Using interaction log data from 425 middle-school students over a school year, we benchmark fifteen predictors including regressions, decision trees, and neural networks. We show that these feature-based models reduce mean absolute error (MAE) by 22-33% relative to heuristic baselines, including fixed-percentile rules adapted from prior work in other behavioral domains. We find that percentile heuristics systematically overpredict, whereas feature-based models better track student practice trajectories across weeks. To support explainability, we analyze feature importance and ablations, revealing target-specific patterns: effort forecasting is driven mainly by recent activity features, while progress forecasting depends more on learner-state and content difficulty signals. Finally, in a semi-structured user interview case study with eight college tutors, we examine how tutors reasoned about system-generated predictive features when setting goals with students. We find that tutors reasoned differently about effort versus progress goals in ways that mirror our pattern analysis. Together, these results establish a reproducible benchmark for forecasting weekly effort and learning progress in ITS. By making patterns of sustained effort and progress visible at a weekly timescale, engagement forecasting offers a foundation for supporting tutor-learner goal setting and timely instructional decisions.

38.7CYMay 11
Improving Hybrid Human-AI Tutoring by Differentiating Human Tutor Roles Based on Student Needs

Ashish Gurung, Ge Gao, Jordan Gutterman et al.

Hybrid human-AI tutoring, where technology and humans jointly facilitate student learning, can be more beneficial than AI-only tutoring. However, preliminary evidence suggests that lower-performing students derive greater benefit from human-AI tutoring than higher-performing students. As such, this study evaluates whether a differentiated tutoring policy can effectively support both groups: human tutors initiate support for lower-performing students, while higher-performing students receive reactive, on-demand support. Using their within-grade median state test scores, we assigned 635 students (grades 5-8) to receive proactive (< median) or reactive ($\geq$ median) tutoring. Using a DiDC design, we compare outcomes across two time periods: fall (AI-only tutoring) and spring (proactive-reactive human-AI tutoring). This quasi-experimental design isolates the effects of proactive-reactive tutoring approaches by comparing the discontinuity in spring outcomes to the fall, where no such discontinuity existed. Using data around the cutoff (Imbens-Kalyanaraman criterion), we find significant overall improvements from human-AI tutoring compared to AI-only baseline: 25% increase in time on task, 36% in skill proficiency, and 61% in academic growth (standardized MAP test). Between proactive and reactive tutoring, we find comparable improvements in time-on-task and skill proficiency. However, proactive tutoring, on average, showed marginally higher MAP growth (75%, p = .065) than reactive tutoring, i.e., proactive tutoring was more beneficial to students farther below the cutoff and helped narrow achievement gaps. Our findings provide evidence that differentiated human-AI tutoring addresses the needs of both groups, offering a practical and cost-effective strategy for scaling hybrid instruction.

AIOct 14, 2024
TRESTLE: A Model of Concept Formation in Structured Domains

Christopher J. MacLellan, Erik Harpstead, Vincent Aleven et al. · gatech

The literature on concept formation has demonstrated that humans are capable of learning concepts incrementally, with a variety of attribute types, and in both supervised and unsupervised settings. Many models of concept formation focus on a subset of these characteristics, but none account for all of them. In this paper, we present TRESTLE, an incremental account of probabilistic concept formation in structured domains that unifies prior concept learning models. TRESTLE works by creating a hierarchical categorization tree that can be used to predict missing attribute values and cluster sets of examples into conceptually meaningful groups. It updates its knowledge by partially matching novel structures and sorting them into its categorization tree. Finally, the system supports mixed-data representations, including nominal, numeric, relational, and component attributes. We evaluate TRESTLE's performance on a supervised learning task and an unsupervised clustering task. For both tasks, we compare it to a nonincremental model and to human participants. We find that this new categorization model is competitive with the nonincremental approach and more closely approximates human behavior on both tasks. These results serve as an initial demonstration of TRESTLE's capabilities and show that, by taking key characteristics of human learning into account, it can better model behavior than approaches that ignore them.

CLMay 2, 2024
How Can I Get It Right? Using GPT to Rephrase Incorrect Trainee Responses

Jionghao Lin, Zifei Han, Danielle R. Thomas et al. · cmu

One-on-one tutoring is widely acknowledged as an effective instructional method, conditioned on qualified tutors. However, the high demand for qualified tutors remains a challenge, often necessitating the training of novice tutors (i.e., trainees) to ensure effective tutoring. Research suggests that providing timely explanatory feedback can facilitate the training process for trainees. However, it presents challenges due to the time-consuming nature of assessing trainee performance by human experts. Inspired by the recent advancements of large language models (LLMs), our study employed the GPT-4 model to build an explanatory feedback system. This system identifies trainees' responses in binary form (i.e., correct/incorrect) and automatically provides template-based feedback with responses appropriately rephrased by the GPT-4 model. We conducted our study on 410 responses from trainees across three training lessons: Giving Effective Praise, Reacting to Errors, and Determining What Students Know. Our findings indicate that: 1) using a few-shot approach, the GPT-4 model effectively identifies correct/incorrect trainees' responses from three training lessons with an average F1 score of 0.84 and an AUC score of 0.85; and 2) using the few-shot approach, the GPT-4 model adeptly rephrases incorrect trainees' responses into desired responses, achieving performance comparable to that of human experts.

CYDec 9, 2023
Using Think-Aloud Data to Understand Relations between Self-Regulation Cycle Characteristics and Student Performance in Intelligent Tutoring Systems

Conrad Borchers, Jiayi Zhang, Ryan S. Baker et al.

Numerous studies demonstrate the importance of self-regulation during learning by problem-solving. Recent work in learning analytics has largely examined students' use of SRL concerning overall learning gains. Limited research has related SRL to in-the-moment performance differences among learners. The present study investigates SRL behaviors in relationship to learners' moment-by-moment performance while working with intelligent tutoring systems for stoichiometry chemistry. We demonstrate the feasibility of labeling SRL behaviors based on AI-generated think-aloud transcripts, identifying the presence or absence of four SRL categories (processing information, planning, enacting, and realizing errors) in each utterance. Using the SRL codes, we conducted regression analyses to examine how the use of SRL in terms of presence, frequency, cyclical characteristics, and recency relate to student performance on subsequent steps in multi-step problems. A model considering students' SRL cycle characteristics outperformed a model only using in-the-moment SRL assessment. In line with theoretical predictions, students' actions during earlier, process-heavy stages of SRL cycles exhibited lower moment-by-moment correctness during problem-solving than later SRL cycle stages. We discuss system re-design opportunities to add SRL support during stages of processing and paths forward for using machine learning to speed research depending on the assessment of SRL based on transcription of think-aloud data.

23.9HCApr 28
Designing and Evaluating Next-Generation Learning Interfaces: Linking AI, HCI, and the Learning Sciences

Meng Xia, Yan Chen, Qiao Jin et al.

This workshop addresses this gap by bringing together researchers and practitioners from AI, HCI, and the learning sciences to explore how interactive systems can better support learning. We focus on the design and evaluation of human-AI collaborative learning interfaces that are technically robust, human-centered, and pedagogically grounded. By fostering interdisciplinary dialogue, the workshop aims to identify shared challenges, design principles, and research directions for next-generation learning technologies.

CYDec 17, 2023
Revealing Networks: Understanding Effective Teacher Practices in AI-Supported Classrooms using Transmodal Ordered Network Analysis

Conrad Borchers, Yeyu Wang, Shamya Karumbaiah et al.

Learning analytics research increasingly studies classroom learning with AI-based systems through rich contextual data from outside these systems, especially student-teacher interactions. One key challenge in leveraging such data is generating meaningful insights into effective teacher practices. Quantitative ethnography bears the potential to close this gap by combining multimodal data streams into networks of co-occurring behavior that drive insight into favorable learning conditions. The present study uses transmodal ordered network analysis to understand effective teacher practices in relationship to traditional metrics of in-system learning in a mathematics classroom working with AI tutors. Incorporating teacher practices captured by position tracking and human observation codes into modeling significantly improved the inference of how efficiently students improved in the AI tutor beyond a model with tutor log data features only. Comparing teacher practices by student learning rates, we find that students with low learning rates exhibited more hint use after monitoring. However, after an extended visit, students with low learning rates showed learning behavior similar to their high learning rate peers, achieving repeated correct attempts in the tutor. Observation notes suggest conceptual and procedural support differences can help explain visit effectiveness. Taken together, offering early conceptual support to students with low learning rates could make classroom practice with AI tutors more effective. This study advances the scientific understanding of effective teacher practice in classrooms learning with AI tutors and methodologies to make such practices visible.

9.9HCApr 6
Balancing Teacher and Student Agency: Co-Orchestration Tool Design Supporting Real-Time Dynamic Pairing

Kexin Bella Yang, Menghan Liu, Liyi Xu et al.

In human-AI interaction, respecting user agency is essential for fostering trust and sustaining effective use of technology. In educational settings, dynamically integrating individual and collaborative learning offers pedagogical value by supporting personalized, self-paced learning experiences. Prior research has demonstrated the feasibility of this approach through intelligent tutoring systems and human-AI co-orchestration tools. However, how to balance teacher and student control in this process remains largely unexplored. This work explores the design space of how control can be distributed between teachers and students across the orchestration process, using participatory speed dating and a mixed-method analysis. We focus on three stages of the pairing process: before, during, and after, taking context in designing classroom orchestration tools that support teachers in dynamically coordinating student transitions between individual practice and collaborative problem-solving. It contributes empirical insights to the fields of educational technology and HCI by framing these findings within a theoretical design space, emphasizing the balance of multi-stakeholder agency and control. We propose design recommendations for achieving hybrid-control in analytic-based orchestration tools in pairing contexts. We recommend ensuring structured teacher guidance in the beginning, while progressively increasing student autonomy over time as activities unfold.

CYJan 17, 2025
An Integrated Platform for Studying Learning with Intelligent Tutoring Systems: CTAT+TutorShop

Vincent Aleven, Conrad Borchers, Yun Huang et al.

Intelligent tutoring systems (ITSs) are effective in helping students learn; further research could make them even more effective. Particularly desirable is research into how students learn with these systems, how these systems best support student learning, and what learning sciences principles are key in ITSs. CTAT+Tutorshop provides a full stack integrated platform that facilitates a complete research lifecycle with ITSs, which includes using ITS data to discover learner challenges, to identify opportunities for system improvements, and to conduct experimental studies. The platform includes authoring tools to support and accelerate development of ITS, which provide automatic data logging in a format compatible with DataShop, an independent site that supports the analysis of ed tech log data to study student learnings. Among the many technology platforms that exist to support learning sciences research, CTAT+Tutorshop may be the only one that offers researchers the possibility to author elements of ITSs, or whole ITSs, as part of designing studies. This platform has been used to develop and conduct an estimated 147 research studies which have run in a wide variety of laboratory and real-world educational settings, including K-12 and higher education, and have addressed a wide range of research questions. This paper presents five case studies of research conducted on the CTAT+Tutorshop platform, and summarizes what has been accomplished and what is possible for future researchers. We reflect on the distinctive elements of this platform that have made it so effective in facilitating a wide range of ITS research.

CYJun 21, 2025
Optimizing Mastery Learning by Fast-Forwarding Over-Practice Steps

Meng Xia, Robin Schmucker, Conrad Borchers et al.

Mastery learning improves learning proficiency and efficiency. However, the overpractice of skills--students spending time on skills they have already mastered--remains a fundamental challenge for tutoring systems. Previous research has reduced overpractice through the development of better problem selection algorithms and the authoring of focused practice tasks. However, few efforts have concentrated on reducing overpractice through step-level adaptivity, which can avoid resource-intensive curriculum redesign. We propose and evaluate Fast-Forwarding as a technique that enhances existing problem selection algorithms. Based on simulation studies informed by learner models and problem-solving pathways derived from real student data, Fast-Forwarding can reduce overpractice by up to one-third, as it does not require students to complete problem-solving steps if all remaining pathways are fully mastered. Fast-Forwarding is a flexible method that enhances any problem selection algorithm, though its effectiveness is highest for algorithms that preferentially select difficult problems. Therefore, our findings suggest that while Fast-Forwarding may improve student practice efficiency, the size of its practical impact may also depend on students' ability to stay motivated and engaged at higher levels of difficulty.

HCApr 2, 2021
Designing for human-AI complementarity in K-12 education

Kenneth Holstein, Vincent Aleven

Recent work has explored how complementary strengths of humans and artificial intelligence (AI) systems might be productively combined. However, successful forms of human-AI partnership have rarely been demonstrated in real-world settings. We present the iterative design and evaluation of Lumilo, smart glasses that help teachers help their students in AI-supported classrooms by presenting real-time analytics about students' learning, metacognition, and behavior. Results from a field study conducted in K-12 classrooms indicate that students learn more when teachers and AI tutors work together during class. We discuss implications of this research for the design of human-AI partnerships. We argue for more participatory approaches to research and design in this area, in which practitioners and other stakeholders are deeply, meaningfully involved throughout the process. Furthermore, we advocate for theory-building and for principled approaches to the study of human-AI decision-making in real-world contexts.