Gautam Biswas

h-index12

33papers

249citations

Novelty46%

AI Score54

Ranked #28,873 of 201,326 authors (top 14%)#6,670 in LG (top 16%)

33 Papers

MAMay 28

A Theory-Guided LLM Pedagogical Agent for STEM+C Scaffolding Without Over-Reliance

Clayton Cohn, Surya Rayala, Siyuan Guo et al.

LLM pedagogical agents are proliferating, yet recent findings have raised questions about their adherence to established theories of learning and, by extension, their educational value. Concerns regarding cognitive offloading, over-reliance, and "gaming" behaviors persist and remain largely unaddressed. In response, we developed Copa, an agentic, multi-agent, multimodal Collaborative Peer Agent for STEM+C learning. Copa is built on top of the Evidence-Decision-Feedback (EDF) framework, grounding its interactions in Social Cognitive Theory and Social Constructivism and promoting sense-making through adaptive, dialogic support rather than answer-seeking. In an authentic high school computational-modeling study (n=33 dyads), we demonstrate that Copa (1) supports students' confidence building and ability to verbalize conceptual understanding without causing dependence; and (2) provides adaptive feedback personalized to learners that is interpretable with respect to students' multimodal input data. These findings position theory-guided, multimodal LLM agents as a promising path toward classroom AI integration that amplifies students' reasoning rather than replacing it.

CVJun 1

Diagnosis of Human Object Interaction Detectors for Real World Educational Applications

Divya Mereddy, Ashwin Tudur Sadashiva, Marcos Quinones-Grueiro et al.

Human-object interaction (HOI) recognition is critical for automatically analyzing student behavior in complex educational environments. Although state-of-the-art (SOTA) HOI detectors perform well on benchmark datasets, their performance often degrades when deployed in real-world training environments due to domain-specific objects, occlusions, and complex visual conditions. In this paper, we introduce a diagnosis-driven framework that integrates a triplet-level HOI error taxonomy with error-factor attribution analysis for real-world educational video data. We study this problem in the context of Critical Care Air Transport Team (CCATT) mixed-reality medical training. Based on an analysis of HOI failure modes and their causes, we develop a diagnosis-informed refinement strategy for adapting pretrained HOI models to the target domain. Experiments on the CCATT dataset show that this approach improves the macro-F1 score of a pretrained CDN model from 48.6 to 90.2 through targeted refinement guided by diagnosed error factors. These results highlight the value of detailed diagnostic analysis for informing targeted adaptation of HOI models in real-world educational environments.

MAOct 18, 2023

MARVEL: Multi-Agent Reinforcement-Learning for Large-Scale Variable Speed Limits

Yuhang Zhang, Marcos Quinones-Grueiro, Zhiyao Zhang et al.

Variable Speed Limit (VSL) control acts as a promising highway traffic management strategy with worldwide deployment, which can enhance traffic safety by dynamically adjusting speed limits according to real-time traffic conditions. Most of the deployed VSL control algorithms so far are rule-based, lacking generalizability under varying and complex traffic scenarios. In this work, we propose MARVEL (Multi-Agent Reinforcement-learning for large-scale Variable spEed Limits), a novel framework for large-scale VSL control on highway corridors with real-world deployment settings. MARVEL utilizes only sensing information observable in the real world as state input and learns through a reward structure that incorporates adaptability to traffic conditions, safety, and mobility, thereby enabling multi-agent coordination. With parameter sharing among all VSL agents, the proposed framework scales to cover corridors with many agents. The policies are trained in a microscopic traffic simulation environment, focusing on a short freeway stretch with 8 VSL agents spanning 7 miles. For testing, these policies are applied to a more extensive network with 34 VSL agents spanning 17 miles of I-24 near Nashville, TN, USA. MARVEL-based method improves traffic safety by 63.4% compared to the no control scenario and enhances traffic mobility by 58.6% compared to a state-of-the-practice algorithm that has been deployed on I-24. Besides, we conduct an explainability analysis to examine the decision-making process of the agents and explore the learned policy under different traffic conditions. Finally, we test the response of the policy learned from the simulation-based experiments with real-world data collected from I-24 and illustrate its deployment capability.

ROMay 19, 2022

Concurrent Policy Blending and System Identification for Generalized Assistive Control

Luke Bhan, Marcos Quinones-Grueiro, Gautam Biswas

In this work, we address the problem of solving complex collaborative robotic tasks subject to multiple varying parameters. Our approach combines simultaneous policy blending with system identification to create generalized policies that are robust to changes in system parameters. We employ a blending network whose state space relies solely on parameter estimates from a system identification technique. As a result, this blending network learns how to handle parameter changes instead of trying to learn how to solve the task for a generalized parameter set simultaneously. We demonstrate our scheme's ability on a collaborative robot and human itching task in which the human has motor impairments. We then showcase our approach's efficiency with a variety of system identification techniques when compared to standard domain randomization.

MAMar 24

Evidence-Decision-Feedback: Theory-Driven Adaptive Scaffolding for LLM Agents

Clayton Cohn, Siyuan Guo, Surya Rayala et al.

Multi-agent LLM architectures offer opportunities for pedagogical agents to help students construct domain knowledge and develop critical-thinking skills, yet many operate on a "one-size-fits-all" basis, limiting their ability to provide personalized support. To address this, we introduce Evidence-Decision-Feedback (EDF), a theoretical framework for adaptive scaffolding using LLMs. EDF integrates elements of intelligent tutoring systems and agentic behavior by organizing interactions around evidentiary inference, pedagogical decision-making, and adaptive feedback. We instantiate EDF through Copa, an agentic collaborative peer agent for STEM+C problem-solving. In an authentic high school classroom study, we show that EDF-guided interactions align feedback with students' demonstrated understanding and task mastery; promote gradual scaffold fading; and support interpretable, evidence-grounded explanations without fostering overreliance.

LGAug 22, 2024

Multimodal Methods for Analyzing Learning and Training Environments: A Systematic Literature Review

Clayton Cohn, Eduardo Davalos, Caleb Vatral et al.

Recent technological advancements in multimodal machine learning--including the rise of large language models (LLMs)--have improved our ability to collect, process, and analyze diverse multimodal data such as speech, video, and eye gaze in learning and training contexts. While prior reviews have addressed individual components of the multimodal pipeline (e.g., conceptual models, data fusion), a comprehensive review of empirical methods in applied multimodal environments remains notably absent. This review addresses that, introducing a taxonomy and framework that capture both established practices and recent innovations driven by LLMs and generative AI. We identify five modality groups: Natural Language, Vision, Physiological Signals, Human-Centered Evidence, and Environment Logs. Our analysis reveals that integrating modalities enables richer insights into learner and trainee behaviors, revealing latent patterns often overlooked by unimodal approaches. However, persistent challenges in multimodal data collection and integration continue to hinder the adoption of these systems in real-time classroom settings.

AIFeb 6

BEAGLE: Behavior-Enforced Agent for Grounded Learner Emulation

Hanchen David Wang, Clayton Cohn, Zifan Xu et al.

Simulating student learning behaviors in open-ended problem-solving environments holds potential for education research, from training adaptive tutoring systems to stress-testing pedagogical interventions. However, collecting authentic data is challenging due to privacy concerns and the high cost of longitudinal studies. While Large Language Models (LLMs) offer a promising path to student simulation, they suffer from competency bias, optimizing for efficient correctness rather than the erratic, iterative struggle characteristic of novice learners. We present BEAGLE, a neuro-symbolic framework that addresses this bias by incorporating Self-Regulated Learning (SRL) theory into a novel architecture. BEAGLE integrates three key technical innovations: (1) a semi-Markov model that governs the timing and transitions of cognitive behaviors and metacognitive behaviors; (2) Bayesian Knowledge Tracing with explicit flaw injection to enforce realistic knowledge gaps and "unknown unknowns"; and (3) a decoupled agent design that separates high-level strategy use from code generation actions to prevent the model from silently correcting its own intentional errors. In evaluations on Python programming tasks, BEAGLE significantly outperforms state-of-the-art baselines in reproducing authentic trajectories. In a human Turing test, users were unable to distinguish synthetic traces from real student data, achieving an accuracy indistinguishable from random guessing (52.8%).

CVMay 16

AI-Assisted Competency Assessment from Egocentric Video in Simulation-Based Nursing Education

Hanchen David Wang, Yilin Liu, Madison J. Lee et al.

Assessing learner competency in clinical simulation requires expert observation that is time-intensive, difficult to scale, and subject to inter-rater variability. Vision-language models have emerged as a promising tool for understanding complex visual behavior. In this work, we investigate whether visual observations can provide educationally meaningful signals for competency assessment through a three-stage framework that (1) extracts action timelines from egocentric nursing simulation video using frozen visual encoders and few-shot learning, (2) derives sequence-level features and per-session recognition metrics, and (3) relates these to instructor-rated competency. Across 22 densely annotated sessions (3.8 hours, 493 actions), a frozen DINOv2 backbone with HMM Viterbi decoding achieves 57.4% MOF in leave-one-out 1-shot recognition. Surprisingly, we observe a negative trend between recognition accuracy and competency (rho = -0.524, p = 0.012 for mIoU), robust to six confound controls: more competent students produce diverse, harder-to-classify workflows, while simple sequence features show no such relationship. Per-item analysis identifies patient safety protocols and team communication as the expected behaviors most reflected in this pattern, and process model comparisons reveal that higher-competency students exhibit more protocol-consistent action transitions. These findings suggest that recognition accuracy may complement predicted action timelines as a pedagogically informative signal in automated competency assessment.

LGJan 8

Using Large Language Models to Detect Socially Shared Regulation of Collaborative Learning

Jiayi Zhang, Conrad Borchers, Clayton Cohn et al.

The field of learning analytics has made notable strides in automating the detection of complex learning processes in multimodal data. However, most advancements have focused on individualized problem-solving instead of collaborative, open-ended problem-solving, which may offer both affordances (richer data) and challenges (low cohesion) to behavioral prediction. Here, we extend predictive models to automatically detect socially shared regulation of learning (SSRL) behaviors in collaborative computational modeling environments using embedding-based approaches. We leverage large language models (LLMs) as summarization tools to generate task-aware representations of student dialogue aligned with system logs. These summaries, combined with text-only embeddings, context-enriched embeddings, and log-derived features, were used to train predictive models. Results show that text-only embeddings often achieve stronger performance in detecting SSRL behaviors related to enactment or group dynamics (e.g., off-task behavior or requesting assistance). In contrast, contextual and multimodal features provide complementary benefits for constructs such as planning and reflection. Overall, our findings highlight the promise of embedding-based models for extending learning analytics by enabling scalable detection of SSRL behaviors, ultimately supporting real-time feedback and adaptive scaffolding in collaborative learning environments that teachers value.

CVDec 29, 2025

Video-Based Performance Evaluation for ECR Drills in Synthetic Training Environments

Surya Rayala, Marcos Quinones-Grueiro, Naveeduddin Mohammed et al.

Effective urban warfare training requires situational awareness and muscle memory, developed through repeated practice in realistic yet controlled environments. A key drill, Enter and Clear the Room (ECR), demands threat assessment, coordination, and securing confined spaces. The military uses Synthetic Training Environments that offer scalable, controlled settings for repeated exercises. However, automatic performance assessment remains challenging, particularly when aiming for objective evaluation of cognitive, psychomotor, and teamwork skills. Traditional methods often rely on costly, intrusive sensors or subjective human observation, limiting scalability and accuracy. This paper introduces a video-based assessment pipeline that derives performance analytics from training videos without requiring additional hardware. By utilizing computer vision models, the system extracts 2D skeletons, gaze vectors, and movement trajectories. From these data, we develop task-specific metrics that measure psychomotor fluency, situational awareness, and team coordination. These metrics feed into an extended Cognitive Task Analysis (CTA) hierarchy, which employs a weighted combination to generate overall performance scores for teamwork and cognition. We demonstrate the approach with a case study of real-world ECR drills, providing actionable, domain specific metrics that capture individual and team performance. We also discuss how these insights can support After Action Reviews with interactive dashboards within Gamemaster and the Generalized Intelligent Framework for Tutoring (GIFT), providing intuitive and understandable feedback. We conclude by addressing limitations, including tracking difficulties, ground-truth validation, and the broader applicability of our approach. Future work includes expanding analysis to 3D video data and leveraging video analysis to enable scalable evaluation within STEs.

CVAug 27, 2025Code

WEBEYETRACK: Scalable Eye-Tracking for the Browser via On-Device Few-Shot Personalization

Eduardo Davalos, Yike Zhang, Namrata Srivastava et al.

With advancements in AI, new gaze estimation methods are exceeding state-of-the-art (SOTA) benchmarks, but their real-world application reveals a gap with commercial eye-tracking solutions. Factors like model size, inference time, and privacy often go unaddressed. Meanwhile, webcam-based eye-tracking methods lack sufficient accuracy, in particular due to head movement. To tackle these issues, we introduce We bEyeTrack, a framework that integrates lightweight SOTA gaze estimation models directly in the browser. It incorporates model-based head pose estimation and on-device few-shot learning with as few as nine calibration samples (k < 9). WebEyeTrack adapts to new users, achieving SOTA performance with an error margin of 2.32 cm on GazeCapture and real-time inference speeds of 2.4 milliseconds on an iPhone 14. Our open-source code is available at https://github.com/RedForestAi/WebEyeTrack.

CLMar 21, 2024

A Chain-of-Thought Prompting Approach with LLMs for Evaluating Students' Formative Assessment Responses in Science

Clayton Cohn, Nicole Hutchins, Tuan Le et al.

This paper explores the use of large language models (LLMs) to score and explain short-answer assessments in K-12 science. While existing methods can score more structured math and computer science assessments, they often do not provide explanations for the scores. Our study focuses on employing GPT-4 for automated assessment in middle school Earth Science, combining few-shot and active learning with chain-of-thought reasoning. Using a human-in-the-loop approach, we successfully score and provide meaningful explanations for formative assessment responses. A systematic analysis of our method's pros and cons sheds light on the potential for human-in-the-loop techniques to enhance automated grading for open-ended science assessments.

AIMay 10, 2024

A First Step in Using Machine Learning Methods to Enhance Interaction Analysis for Embodied Learning Environments

Joyce Fonteles, Eduardo Davalos, Ashwin T. S. et al.

Investigating children's embodied learning in mixed-reality environments, where they collaboratively simulate scientific processes, requires analyzing complex multimodal data to interpret their learning and coordination behaviors. Learning scientists have developed Interaction Analysis (IA) methodologies for analyzing such data, but this requires researchers to watch hours of videos to extract and interpret students' learning patterns. Our study aims to simplify researchers' tasks, using Machine Learning and Multimodal Learning Analytics to support the IA processes. Our study combines machine learning algorithms and multimodal analyses to support and streamline researcher efforts in developing a comprehensive understanding of students' scientific engagement through their movements, gaze, and affective responses in a simulated scenario. To facilitate an effective researcher-AI partnership, we present an initial case study to determine the feasibility of visually representing students' states, actions, gaze, affect, and movement on a timeline. Our case study focuses on a specific science scenario where students learn about photosynthesis. The timeline allows us to investigate the alignment of critical learning moments identified by multimodal and interaction analysis, and uncover insights into students' temporal learning progressions.

LGApr 21

Safe Continual Reinforcement Learning in Non-stationary Environments

Austin Coursey, Abel Diaz-Gonzalez, Marcos Quinones-Grueiro et al.

Reinforcement learning (RL) offers a compelling data-driven paradigm for synthesizing controllers for complex systems when accurate physical models are unavailable; however, most existing control-oriented RL methods assume stationarity and, therefore, struggle in real-world non-stationary deployments where system dynamics and operating conditions can change unexpectedly. Moreover, RL controllers acting in physical environments must satisfy safety constraints throughout their learning and execution phases, rendering transient violations during adaptation unacceptable. Although continual RL and safe RL have each addressed non-stationarity and safety, respectively, their intersection remains comparatively unexplored, motivating the study of safe continual RL algorithms that can adapt over the system's lifetime while preserving safety. In this work, we systematically investigate safe continual reinforcement learning by introducing three benchmark environments that capture safety-critical continual adaptation and by evaluating representative approaches from safe RL, continual RL, and their combinations. Our empirical results reveal a fundamental tension between maintaining safety constraints and preventing catastrophic forgetting under non-stationary dynamics, with existing methods generally failing to achieve both objectives simultaneously. To address this shortcoming, we examine regularization-based strategies that partially mitigate this trade-off and characterize their benefits and limitations. Finally, we outline key open challenges and research directions toward developing safe, resilient learning-based controllers capable of sustained autonomous operation in changing environments.

CLMay 22, 2025

Personalizing Student-Agent Interactions Using Log-Contextualized Retrieval Augmented Generation (RAG)

Clayton Cohn, Surya Rayala, Caitlin Snyder et al.

Collaborative dialogue offers rich insights into students' learning and critical thinking, which is essential for personalizing pedagogical agent interactions in STEM+C settings. While large language models (LLMs) facilitate dynamic pedagogical interactions, hallucinations undermine confidence, trust, and instructional value. Retrieval-augmented generation (RAG) grounds LLM outputs in curated knowledge but requires a clear semantic link between user input and a knowledge base, which is often weak in student dialogue. We propose log-contextualized RAG (LC-RAG), which enhances RAG retrieval by using environment logs to contextualize collaborative discourse. Our findings show that LC-RAG improves retrieval over a discourse-only baseline and allows our collaborative peer agent, Copa, to deliver relevant, personalized guidance that supports students' critical thinking and epistemic decision-making in a collaborative computational modeling environment, C2STEM.

CLApr 3, 2025

CoTAL: Human-in-the-Loop Prompt Engineering for Generalizable Formative Assessment Scoring

Clayton Cohn, Ashwin T S, Naveeduddin Mohammed et al.

Large language models (LLMs) have created new opportunities to assist teachers and support student learning. While researchers have explored various prompt engineering approaches in educational contexts, the degree to which these approaches generalize across domains--such as science, computing, and engineering--remains underexplored. In this paper, we introduce Chain-of-Thought Prompting + Active Learning (CoTAL), an LLM-based approach to formative assessment scoring that (1) leverages Evidence-Centered Design (ECD) to align assessments and rubrics with curriculum goals, (2) applies human-in-the-loop prompt engineering to automate response scoring, and (3) incorporates chain-of-thought (CoT) prompting and teacher and student feedback to iteratively refine questions, rubrics, and LLM prompts. Our findings demonstrate that CoTAL improves GPT-4's scoring performance across domains, achieving gains of up to 38.9% over a non-prompt-engineered baseline (i.e., without labeled examples, chain-of-thought prompting, or iterative refinement). Teachers and students judge CoTAL to be effective at scoring and explaining responses, and their feedback produces valuable insights that enhance grading accuracy and explanation quality.

CYMar 3, 2025

LLMs as Educational Analysts: Transforming Multimodal Data Traces into Actionable Reading Assessment Reports

Eduardo Davalos, Yike Zhang, Namrata Srivastava et al.

Reading assessments are essential for enhancing students' comprehension, yet many EdTech applications focus mainly on outcome-based metrics, providing limited insights into student behavior and cognition. This study investigates the use of multimodal data sources -- including eye-tracking data, learning outcomes, assessment content, and teaching standards -- to derive meaningful reading insights. We employ unsupervised learning techniques to identify distinct reading behavior patterns, and then a large language model (LLM) synthesizes the derived information into actionable reports for educators, streamlining the interpretation process. LLM experts and human educators evaluate these reports for clarity, accuracy, relevance, and pedagogical usefulness. Our findings indicate that LLMs can effectively function as educational analysts, turning diverse data into teacher-friendly insights that are well-received by educators. While promising for automating insight generation, human oversight remains crucial to ensure reliability and fairness. This research advances human-centered AI in education, connecting data-driven analytics with practical classroom applications.

CLAug 2, 2025

A Theory of Adaptive Scaffolding for LLM-Based Pedagogical Agents

Clayton Cohn, Surya Rayala, Namrata Srivastava et al.

Large language models (LLMs) present new opportunities for creating pedagogical agents that engage in meaningful dialogue to support student learning. However, the current use of LLM systems like ChatGPT in classrooms often lacks the solid theoretical foundation found in earlier intelligent tutoring systems. To bridge this gap, we propose a framework that combines Evidence-Centered Design with Social Cognitive Theory for adaptive scaffolding in LLM-based agents focused on STEM+C learning. We illustrate this framework with Inquizzitor, an LLM-based formative assessment agent that integrates human-AI hybrid intelligence and provides feedback grounded in cognitive science principles. Our findings show that Inquizzitor delivers high-quality assessment and interaction aligned with core learning theories, offering teachers effective guidance that students value. This research underscores the potential for theory-driven LLM integration in education, highlighting the ability of these systems to provide adaptive and principled instruction.

HCJan 30, 2025

Beyond Instructed Tasks: Recognizing In-the-Wild Reading Behaviors in the Classroom Using Eye Tracking

Eduardo Davalos, Jorge Alberto Salas, Yike Zhang et al.

Understanding reader behaviors such as skimming, deep reading, and scanning is essential for improving educational instruction. While prior eye-tracking studies have trained models to recognize reading behaviors, they often rely on instructed reading tasks, which can alter natural behaviors and limit the applicability of these findings to in-the-wild settings. Additionally, there is a lack of clear definitions for reading behavior archetypes in the literature. We conducted a classroom study to address these issues by collecting instructed and in-the-wild reading data. We developed a mixed-method framework, including a human-driven theoretical model, statistical analyses, and an AI classifier, to differentiate reading behaviors based on their velocity, density, and sequentiality. Our lightweight 2D CNN achieved an F1 score of 0.8 for behavior recognition, providing a robust approach for understanding in-the-wild reading. This work advances our ability to provide detailed behavioral insights to educators, supporting more targeted and effective assessment and instruction.

CVSep 22, 2025

Trainee Action Recognition through Interaction Analysis in CCATT Mixed-Reality Training

Divya Mereddy, Marcos Quinones-Grueiro, Ashwin T S et al.

This study examines how Critical Care Air Transport Team (CCATT) members are trained using mixed-reality simulations that replicate the high-pressure conditions of aeromedical evacuation. Each team - a physician, nurse, and respiratory therapist - must stabilize severely injured soldiers by managing ventilators, IV pumps, and suction devices during flight. Proficient performance requires clinical expertise and cognitive skills, such as situational awareness, rapid decision-making, effective communication, and coordinated task management, all of which must be maintained under stress. Recent advances in simulation and multimodal data analytics enable more objective and comprehensive performance evaluation. In contrast, traditional instructor-led assessments are subjective and may overlook critical events, thereby limiting generalizability and consistency. However, AI-based automated and more objective evaluation metrics still demand human input to train the AI algorithms to assess complex team dynamics in the presence of environmental noise and the need for accurate re-identification in multi-person tracking. To address these challenges, we introduce a systematic, data-driven assessment framework that combines Cognitive Task Analysis (CTA) with Multimodal Learning Analytics (MMLA). We have developed a domain-specific CTA model for CCATT training and a vision-based action recognition pipeline using a fine-tuned Human-Object Interaction model, the Cascade Disentangling Network (CDN), to detect and track trainee-equipment interactions over time. These interactions automatically yield performance indicators (e.g., reaction time, task duration), which are mapped onto a hierarchical CTA model tailored to CCATT operations, enabling interpretable, domain-relevant performance evaluations.

CYSep 11, 2025

LearnLens: An AI-Enhanced Dashboard to Support Teachers in Open-Ended Classrooms

Namrata Srivastava, Shruti Jain, Clayton Cohn et al.

Exploratory learning environments (ELEs), such as simulation-based platforms and open-ended science curricula, promote hands-on exploration and problem-solving but make it difficult for teachers to gain timely insights into students' conceptual understanding. This paper presents LearnLens, a generative AI (GenAI)-enhanced teacher-facing dashboard designed to support problem-based instruction in middle school science. LearnLens processes students' open-ended responses from digital assessments to provide various insights, including sample responses, word clouds, bar charts, and AI-generated summaries. These features elucidate students' thinking, enabling teachers to adjust their instruction based on emerging patterns of understanding. The dashboard was informed by teacher input during professional development sessions and implemented within a middle school Earth science curriculum. We report insights from teacher interviews that highlight the dashboard's usability and potential to guide teachers' instruction in the classroom.

HCSep 3, 2025

Designing Gaze Analytics for ELA Instruction: A User-Centered Dashboard with Conversational AI Support

Eduardo Davalos, Yike Zhang, Shruti Jain et al.

Eye-tracking offers rich insights into student cognition and engagement, but remains underutilized in classroom-facing educational technology due to challenges in data interpretation and accessibility. In this paper, we present the iterative design and evaluation of a gaze-based learning analytics dashboard for English Language Arts (ELA), developed through five studies involving teachers and students. Guided by user-centered design and data storytelling principles, we explored how gaze data can support reflection, formative assessment, and instructional decision-making. Our findings demonstrate that gaze analytics can be approachable and pedagogically valuable when supported by familiar visualizations, layered explanations, and narrative scaffolds. We further show how a conversational agent, powered by a large language model (LLM), can lower cognitive barriers to interpreting gaze data by enabling natural language interactions with multimodal learning analytics. We conclude with design implications for future EdTech systems that aim to integrate novel data modalities in classroom contexts.

LGFeb 21, 2025

On the Design of Safe Continual RL Methods for Control of Nonlinear Systems

Austin Coursey, Marcos Quinones-Grueiro, Gautam Biswas

Reinforcement learning (RL) algorithms have been successfully applied to control tasks associated with unmanned aerial vehicles and robotics. In recent years, safe RL has been proposed to allow the safe execution of RL algorithms in industrial and mission-critical systems that operate in closed loops. However, if the system operating conditions change, such as when an unknown fault occurs in the system, typical safe RL algorithms are unable to adapt while retaining past knowledge. Continual reinforcement learning algorithms have been proposed to address this issue. However, the impact of continual adaptation on the system's safety is an understudied problem. In this paper, we study the intersection of safe and continual RL. First, we empirically demonstrate that a popular continual RL algorithm, online elastic weight consolidation, is unable to satisfy safety constraints in non-linear systems subject to varying operating conditions. Specifically, we study the MuJoCo HalfCheetah and Ant environments with velocity constraints and sudden joint loss non-stationarity. Then, we show that an agent trained using constrained policy optimization, a safe RL algorithm, experiences catastrophic forgetting in continual learning settings. With this in mind, we explore a simple reward-shaping method to ensure that elastic weight consolidation prioritizes remembering both safety and task performance for safety-constrained, non-linear, and non-stationary dynamical systems.

LGJun 21, 2024

FT-AED: Benchmark Dataset for Early Freeway Traffic Anomalous Event Detection

Austin Coursey, Junyi Ji, Marcos Quinones-Grueiro et al.

Early and accurate detection of anomalous events on the freeway, such as accidents, can improve emergency response and clearance. However, existing delays and errors in event identification and reporting make it a difficult problem to solve. Current large-scale freeway traffic datasets are not designed for anomaly detection and ignore these challenges. In this paper, we introduce the first large-scale lane-level freeway traffic dataset for anomaly detection. Our dataset consists of a month of weekday radar detection sensor data collected in 4 lanes along an 18-mile stretch of Interstate 24 heading toward Nashville, TN, comprising over 3.7 million sensor measurements. We also collect official crash reports from the Nashville Traffic Management Center and manually label all other potential anomalies in the dataset. To show the potential for our dataset to be used in future machine learning and traffic research, we benchmark numerous deep learning anomaly detection models on our dataset. We find that unsupervised graph neural network autoencoders are a promising solution for this problem and that ignoring spatial relationships leads to decreased performance. We demonstrate that our methods can reduce reporting delays by over 10 minutes on average while detecting 75% of crashes. Our dataset and all preprocessing code needed to get started are publicly released at https://vu.edu/ft-aed/ to facilitate future research.

CLMay 6, 2024

Towards A Human-in-the-Loop LLM Approach to Collaborative Discourse Analysis

Clayton Cohn, Caitlin Snyder, Justin Montenegro et al.

LLMs have demonstrated proficiency in contextualizing their outputs using human input, often matching or beating human-level performance on a variety of tasks. However, LLMs have not yet been used to characterize synergistic learning in students' collaborative discourse. In this exploratory work, we take a first step towards adopting a human-in-the-loop prompt engineering approach with GPT-4-Turbo to summarize and categorize students' synergistic learning during collaborative discourse. Our preliminary findings suggest GPT-4-Turbo may be able to characterize students' synergistic learning in a manner comparable to humans and that our approach warrants further investigation.

SYMay 21, 2023

A Reinforcement Learning Approach for Robust Supervisory Control of UAVs Under Disturbances

Ibrahim Ahmed, Marcos Quinones-Grueiro, Gautam Biswas

In this work, we present an approach to supervisory reinforcement learning control for unmanned aerial vehicles (UAVs). UAVs are dynamic systems where control decisions in response to disturbances in the environment have to be made in the order of milliseconds. We formulate a supervisory control architecture that interleaves with extant embedded control and demonstrates robustness to environmental disturbances in the form of adverse wind conditions. We run case studies with a Tarot T-18 Octorotor to demonstrate the effectiveness of our approach and compare it against a classic cascade control architecture used in most vehicles. While the results show the performance difference is marginal for nominal operations, substantial performance improvement is obtained with the supervisory RL approach under unseen wind conditions.

SYMay 20, 2023

Model-based adaptation for sample efficient transfer in reinforcement learning control of parameter-varying systems

Ibrahim Ahmed, Marcos Quinones-Grueiro, Gautam Biswas

In this paper, we leverage ideas from model-based control to address the sample efficiency problem of reinforcement learning (RL) algorithms. Accelerating learning is an active field of RL highly relevant in the context of time-varying systems. Traditional transfer learning methods propose to use prior knowledge of the system behavior to devise a gradual or immediate data-driven transformation of the control policy obtained through RL. Such transformation is usually computed by estimating the performance of previous control policies based on measurements recently collected from the system. However, such retrospective measures have debatable utility with no guarantees of positive transfer in most cases. Instead, we propose a model-based transformation, such that when actions from a control policy are applied to the target system, a positive transfer is achieved. The transformation can be used as an initialization for the reinforcement learning process to converge to a new optimum. We validate the performance of our approach through four benchmark examples. We demonstrate that our approach is more sample-efficient than fine-tuning with reinforcement learning alone and achieves comparable performance to linear-quadratic-regulators and model-predictive control when an accurate linear model is known in the three cases. If an accurate model is not known, we empirically show that the proposed approach still guarantees positive transfer with jump-start improvement.

LGDec 10, 2020

Performance-Weighed Policy Sampling for Meta-Reinforcement Learning

Ibrahim Ahmed, Marcos Quinones-Grueiro, Gautam Biswas

This paper discusses an Enhanced Model-Agnostic Meta-Learning (E-MAML) algorithm that generates fast convergence of the policy function from a small number of training examples when applied to new learning tasks. Built on top of Model-Agnostic Meta-Learning (MAML), E-MAML maintains a set of policy parameters learned in the environment for previous tasks. We apply E-MAML to developing reinforcement learning (RL)-based online fault tolerant control schemes for dynamic systems. The enhancement is applied when a new fault occurs, to re-initialize the parameters of a new RL policy that achieves faster adaption with a small number of samples of system behavior with the new fault. This replaces the random task sampling step in MAML. Instead, it exploits the extant previously generated experiences of the controller. The enhancement is sampled to maximally span the parameter space to facilitate adaption to the new fault. We demonstrate the performance of our approach combining E-MAML with proximal policy optimization (PPO) on the well-known cart pole example, and then on the fuel transfer system of an aircraft.

LGSep 26, 2020

Complementary Meta-Reinforcement Learning for Fault-Adaptive Control

Ibrahim Ahmed, Marcos Quinones-Grueiro, Gautam Biswas

Faults are endemic to all systems. Adaptive fault-tolerant control maintains degraded performance when faults occur as opposed to unsafe conditions or catastrophic events. In systems with abrupt faults and strict time constraints, it is imperative for control to adapt quickly to system changes to maintain system operations. We present a meta-reinforcement learning approach that quickly adapts its control policy to changing conditions. The approach builds upon model-agnostic meta learning (MAML). The controller maintains a complement of prior policies learned under system faults. This "library" is evaluated on a system after a new fault to initialize the new policy. This contrasts with MAML, where the controller derives intermediate policies anew, sampled from a distribution of similar systems, to initialize a new policy. Our approach improves sample efficiency of the reinforcement learning process. We evaluate our approach on an aircraft fuel transfer system under abrupt faults.

SYAug 10, 2020

Fault-Tolerant Control of Degrading Systems with On-Policy Reinforcement Learning

Ibrahim Ahmed, Marcos Quiñones-Grueiro, Gautam Biswas

We propose a novel adaptive reinforcement learning control approach for fault tolerant control of degrading systems that is not preceded by a fault detection and diagnosis step. Therefore, \textit{a priori} knowledge of faults that may occur in the system is not required. The adaptive scheme combines online and offline learning of the on-policy control method to improve exploration and sample efficiency, while guaranteeing stable learning. The offline learning phase is performed using a data-driven model of the system, which is frequently updated to track the system's operating conditions. We conduct experiments on an aircraft fuel transfer system to demonstrate the effectiveness of our approach.

SYAug 10, 2020

Comparison of Model Predictive and Reinforcement Learning Methods for Fault Tolerant Control

Ibrahim Ahmed, Hamed Khorasgani, Gautam Biswas

A desirable property in fault-tolerant controllers is adaptability to system changes as they evolve during systems operations. An adaptive controller does not require optimal control policies to be enumerated for possible faults. Instead it can approximate one in real-time. We present two adaptive fault-tolerant control schemes for a discrete time system based on hierarchical reinforcement learning. We compare their performance against a model predictive controller in presence of sensor noise and persistent faults. The controllers are tested on a fuel tank model of a C-130 plane. Our experiments demonstrate that reinforcement learning-based controllers perform more robustly than model predictive controllers under faults, partially observable system models, and varying sensor noise levels.

LGAug 4, 2020

A Relearning Approach to Reinforcement Learning for Control of Smart Buildings

Avisek Naug, Marcos Quiñones-Grueiro, Gautam Biswas

This paper demonstrates that continual relearning of control policies using incremental deep reinforcement learning (RL) can improve policy learning for non-stationary processes. We demonstrate this approach for a data-driven 'smart building environment' that we use as a test-bed for developing HVAC controllers for reducing energy consumption of large buildings on our university campus. The non-stationarity in building operations and weather patterns makes it imperative to develop control strategies that are adaptive to changing conditions. On-policy RL algorithms, such as Proximal Policy Optimization (PPO) represent an approach for addressing this non-stationarity, but exploration on the actual system is not an option for safety-critical systems. As an alternative, we develop an incremental RL technique that simultaneously reduces building energy consumption without sacrificing overall comfort. We compare the performance of our incremental RL controller to that of a static RL controller that does not implement the relearning function. The performance of the static controller diminishes significantly over time, but the relearning controller adjusts to changing conditions while ensuring comfort and optimal energy performance.

AIMar 27, 2013

Using the Dempster-Shafer Scheme in a Diagnostic Expert System Shell

Gautam Biswas, Teywansh S. Anand

This paper discusses an expert system shell that integrates rule-based reasoning and the Dempster-Shafer evidence combination scheme. Domain knowledge is stored as rules with associated belief functions. The reasoning component uses a combination of forward and backward inferencing mechanisms to allow interaction with users in a mixed-initiative format.