Min Chi

LG
h-index8
27papers
216citations
Novelty49%
AI Score50

27 Papers

SEJun 7, 2022
Code-DKT: A Code-based Knowledge Tracing Model for Programming Tasks

Yang Shi, Min Chi, Tiffany Barnes et al.

Knowledge tracing (KT) models are a popular approach for predicting students' future performance at practice problems using their prior attempts. Though many innovations have been made in KT, most models including the state-of-the-art Deep KT (DKT) mainly leverage each student's response either as correct or incorrect, ignoring its content. In this work, we propose Code-based Deep Knowledge Tracing (Code-DKT), a model that uses an attention mechanism to automatically extract and select domain-specific code features to extend DKT. We compared the effectiveness of Code-DKT against Bayesian and Deep Knowledge Tracing (BKT and DKT) on a dataset from a class of 50 students attempting to solve 5 introductory programming assignments. Our results show that Code-DKT consistently outperforms DKT by 3.07-4.00% AUC across the 5 assignments, a comparable improvement to other state-of-the-art domain-general KT models over DKT. Finally, we analyze problem-specific performance through a set of case studies for one assignment to demonstrate when and how code features improve Code-DKT's predictions.

CYApr 17, 2023
Leveraging Deep Reinforcement Learning for Metacognitive Interventions across Intelligent Tutoring Systems

Mark Abdelshiheed, John Wesley Hostetter, Tiffany Barnes et al.

This work compares two approaches to provide metacognitive interventions and their impact on preparing students for future learning across Intelligent Tutoring Systems (ITSs). In two consecutive semesters, we conducted two classroom experiments: Exp. 1 used a classic artificial intelligence approach to classify students into different metacognitive groups and provide static interventions based on their classified groups. In Exp. 2, we leveraged Deep Reinforcement Learning (DRL) to provide adaptive interventions that consider the dynamic changes in the student's metacognitive levels. In both experiments, students received these interventions that taught how and when to use a backward-chaining (BC) strategy on a logic tutor that supports a default forward-chaining strategy. Six weeks later, we trained students on a probability tutor that only supports BC without interventions. Our results show that adaptive DRL-based interventions closed the metacognitive skills gap between students. In contrast, static classifier-based interventions only benefited a subset of students who knew how to use BC in advance. Additionally, our DRL agent prepared the experimental students for future learning by significantly surpassing their control peers on both ITSs.

HCMar 18, 2023
Mixing Backward- with Forward-Chaining for Metacognitive Skill Acquisition and Transfer

Mark Abdelshiheed, John Wesley Hostetter, Xi Yang et al.

Metacognitive skills have been commonly associated with preparation for future learning in deductive domains. Many researchers have regarded strategy- and time-awareness as two metacognitive skills that address how and when to use a problem-solving strategy, respectively. It was shown that students who are both strategy-and time-aware (StrTime) outperformed their nonStrTime peers across deductive domains. In this work, students were trained on a logic tutor that supports a default forward-chaining (FC) and a backward-chaining (BC) strategy. We investigated the impact of mixing BC with FC on teaching strategy- and time-awareness for nonStrTime students. During the logic instruction, the experimental students (Exp) were provided with two BC worked examples and some problems in BC to practice how and when to use BC. Meanwhile, their control (Ctrl) and StrTime peers received no such intervention. Six weeks later, all students went through a probability tutor that only supports BC to evaluate whether the acquired metacognitive skills are transferred from logic. Our results show that on both tutors, Exp outperformed Ctrl and caught up with StrTime.

CYJul 27, 2022
Investigating the Impact of Backward Strategy Learning in a Logic Tutor: Aiding Subgoal Learning towards Improved Problem Solving

Preya Shabrina, Behrooz Mostafavi, Mark Abdelshiheed et al.

Learning to derive subgoals reduces the gap between experts and students and makes students prepared for future problem solving. Researchers have explored subgoal labeled instructional materials with explanations in traditional problem solving and within tutoring systems to help novices learn to subgoal. However, only a little research is found on problem-solving strategies in relationship with subgoal learning. Also, these strategies are under-explored within computer-based tutors and learning environments. Backward problem-solving strategy is closely related to the process of subgoaling, where problem solving iteratively refines the goal into a new subgoal to reduce difficulty. In this paper, we explore a training strategy for backward strategy learning within an intelligent logic tutor that teaches logic proof construction. The training session involved backward worked examples (BWE) and problem-solving (BPS) to help students learn backward strategy towards improving their subgoaling and problem-solving skills. To evaluate the training strategy, we analyzed students' 1) experience with and engagement in learning backward strategy, 2) performance, and 3) proof construction approaches in new problems that they solved independently without tutor help after each level of training and in post-test. Our results showed that, when new problems were given to solve without any tutor help, students who were trained with both BWE and BPS outperformed students who received none of the treatment or only BWE during training. Additionally, students trained with both BWE and BPS derived subgoals during proof construction with significantly higher efficiency than the other two groups.

HCMar 18, 2023
The Power of Nudging: Exploring Three Interventions for Metacognitive Skills Instruction across Intelligent Tutoring Systems

Mark Abdelshiheed, John Wesley Hostetter, Preya Shabrina et al.

Deductive domains are typical of many cognitive skills in that no single problem-solving strategy is always optimal for solving all problems. It was shown that students who know how and when to use each strategy (StrTime) outperformed those who know neither and stick to the default strategy (Default). In this work, students were trained on a logic tutor that supports a default forward-chaining and a backward-chaining (BC) strategy, then a probability tutor that only supports BC. We investigated three types of interventions on teaching the Default students how and when to use which strategy on the logic tutor: Example, Nudge and Presented. Meanwhile, StrTime students received no interventions. Overall, our results show that Nudge outperformed their Default peers and caught up with StrTime on both tutors.

CYApr 23, 2023
Bridging Declarative, Procedural, and Conditional Metacognitive Knowledge Gap Using Deep Reinforcement Learning

Mark Abdelshiheed, John Wesley Hostetter, Tiffany Barnes et al.

In deductive domains, three metacognitive knowledge types in ascending order are declarative, procedural, and conditional learning. This work leverages Deep Reinforcement Learning (DRL) in providing adaptive metacognitive interventions to bridge the gap between the three knowledge types and prepare students for future learning across Intelligent Tutoring Systems (ITSs). Students received these interventions that taught how and when to use a backward-chaining (BC) strategy on a logic tutor that supports a default forward-chaining strategy. Six weeks later, we trained students on a probability tutor that only supports BC without interventions. Our results show that on both ITSs, DRL bridged the metacognitive knowledge gap between students and significantly improved their learning performance over their control peers. Furthermore, the DRL policy adapted to the metacognitive development on the logic tutor across declarative, procedural, and conditional students, causing their strategic decisions to be more autonomous.

AIJul 7, 2022
Enhancing a Student Productivity Model for Adaptive Problem-Solving Assistance

Mehak Maniktala, Min Chi, Tiffany Barnes

Research on intelligent tutoring systems has been exploring data-driven methods to deliver effective adaptive assistance. While much work has been done to provide adaptive assistance when students seek help, they may not seek help optimally. This had led to the growing interest in proactive adaptive assistance, where the tutor provides unsolicited assistance upon predictions of struggle or unproductivity. Determining when and whether to provide personalized support is a well-known challenge called the assistance dilemma. Addressing this dilemma is particularly challenging in open-ended domains, where there can be several ways to solve problems. Researchers have explored methods to determine when to proactively help students, but few of these methods have taken prior hint usage into account. In this paper, we present a novel data-driven approach to incorporate students' hint usage in predicting their need for help. We explore its impact in an intelligent tutor that deals with the open-ended and well-structured domain of logic proofs. We present a controlled study to investigate the impact of an adaptive hint policy based on predictions of HelpNeed that incorporate students' hint usage. We show empirical evidence to support that such a policy can save students a significant amount of time in training, and lead to improved posttest results, when compared to a control without proactive interventions. We also show that incorporating students' hint usage significantly improves the adaptive hint policy's efficacy in predicting students' HelpNeed, thereby reducing training unproductivity, reducing possible help avoidance, and increasing possible help appropriateness (a higher chance of receiving help when it was likely to be needed). We conclude with suggestions on the domains that can benefit from this approach as well as the requirements for adoption.

LGJan 28, 2023
Variational Latent Branching Model for Off-Policy Evaluation

Qitong Gao, Ge Gao, Min Chi et al.

Model-based methods have recently shown great potential for off-policy evaluation (OPE); offline trajectories induced by behavioral policies are fitted to transitions of Markov decision processes (MDPs), which are used to rollout simulated trajectories and estimate the performance of policies. Model-based OPE methods face two key challenges. First, as offline trajectories are usually fixed, they tend to cover limited state and action space. Second, the performance of model-based methods can be sensitive to the initialization of their parameters. In this work, we propose the variational latent branching model (VLBM) to learn the transition function of MDPs by formulating the environmental dynamics as a compact latent space, from which the next states and rewards are then sampled. Specifically, VLBM leverages and extends the variational inference framework with the recurrent state alignment (RSA), which is designed to capture as much information underlying the limited training data, by smoothing out the information flow between the variational (encoding) and generative (decoding) part of VLBM. Moreover, we also introduce the branching architecture to improve the model's robustness against randomly initialized model weights. The effectiveness of the VLBM is evaluated on the deep OPE (DOPE) benchmark, from which the training trajectories are designed to result in varied coverage of the state-action space. We show that the VLBM outperforms existing state-of-the-art OPE methods in general.

LGFeb 18, 2023
HOPE: Human-Centric Off-Policy Evaluation for E-Learning and Healthcare

Ge Gao, Song Ju, Markel Sanz Ausin et al.

Reinforcement learning (RL) has been extensively researched for enhancing human-environment interactions in various human-centric tasks, including e-learning and healthcare. Since deploying and evaluating policies online are high-stakes in such tasks, off-policy evaluation (OPE) is crucial for inducing effective policies. In human-centric environments, however, OPE is challenging because the underlying state is often unobservable, while only aggregate rewards can be observed (students' test scores or whether a patient is released from the hospital eventually). In this work, we propose a human-centric OPE (HOPE) to handle partial observability and aggregated rewards in such environments. Specifically, we reconstruct immediate rewards from the aggregated rewards considering partial observability to estimate expected total returns. We provide a theoretical bound for the proposed method, and we have conducted extensive experiments in real-world human-centric tasks, including sepsis treatments and an intelligent tutoring system. Our approach reliably predicts the returns of different policies and outperforms state-of-the-art benchmarks using both standard validation methods and human-centric significance tests.

LGMar 15, 2022
Reconstructing Missing EHRs Using Time-Aware Within- and Cross-Visit Information for Septic Shock Early Prediction

Ge Gao, Farzaneh Khoshnevisan, Min Chi

Real-world Electronic Health Records (EHRs) are often plagued by a high rate of missing data. In our EHRs, for example, the missing rates can be as high as 90% for some features, with an average missing rate of around 70% across all features. We propose a Time-Aware Dual-Cross-Visit missing value imputation method, named TA-DualCV, which spontaneously leverages multivariate dependencies across features and longitudinal dependencies both within- and cross-visit to maximize the information extracted from limited observable records in EHRs. Specifically, TA-DualCV captures the latent structure of missing patterns across measurements of different features and it also considers the time continuity and capture the latent temporal missing patterns based on both time-steps and irregular time-intervals. TA-DualCV is evaluated using three large real-world EHRs on two types of tasks: an unsupervised imputation task by varying mask rates up to 90% and a supervised 24-hour early prediction of septic shock using Long Short-Term Memory (LSTM). Our results show that TA-DualCV performs significantly better than all of the existing state-of-the-art imputation baselines, such as DETROIT and TAME, on both types of tasks.

LGOct 11, 2023
Off-Policy Evaluation for Human Feedback

Qitong Gao, Ge Gao, Juncheng Dong et al.

Off-policy evaluation (OPE) is important for closing the gap between offline training and evaluation of reinforcement learning (RL), by estimating performance and/or rank of target (evaluation) policies using offline trajectories only. It can improve the safety and efficiency of data collection and policy testing procedures in situations where online deployments are expensive, such as healthcare. However, existing OPE methods fall short in estimating human feedback (HF) signals, as HF may be conditioned over multiple underlying factors and is only sparsely available; as opposed to the agent-defined environmental rewards (used in policy optimization), which are usually determined over parametric functions or distributions. Consequently, the nature of HF signals makes extrapolating accurate OPE estimations to be challenging. To resolve this, we introduce an OPE for HF (OPEHF) framework that revives existing OPE methods in order to accurately evaluate the HF signals. Specifically, we develop an immediate human reward (IHR) reconstruction approach, regularized by environmental knowledge distilled in a latent space that captures the underlying dynamics of state transitions as well as issuing HF signals. Our approach has been tested over two real-world experiments, adaptive in-vivo neurostimulation and intelligent tutoring, as well as in a simulation environment (visual Q&A). Results show that our approach significantly improves the performance toward estimating HF signals accurately, compared to directly applying (variants of) existing OPE methods.

LGFeb 24
A Generalized Apprenticeship Learning Framework for Capturing Evolving Student Pedagogical Strategies

Md Mirajul Islam, Xi Yang, Adittya Soukarjya Saha et al.

Reinforcement Learning (RL) and Deep Reinforcement Learning (DRL) have advanced rapidly in recent years and have been successfully applied to e-learning environments like intelligent tutoring systems (ITSs). Despite great success, the broader application of DRL to educational technologies has been limited due to major challenges such as sample inefficiency and difficulty designing the reward function. In contrast, Apprenticeship Learning (AL) uses a few expert demonstrations to infer the expert's underlying reward functions and derive decision-making policies that generalize and replicate optimal behavior. In this work, we leverage a generalized AL framework, THEMES, to induce effective pedagogical policies by capturing the complexities of the expert student learning process, where multiple reward functions may dynamically evolve over time. We evaluate the effectiveness of THEMES against six state-of-the-art baselines, demonstrating its superior performance and highlighting its potential as a powerful alternative for inducing effective pedagogical policies and show that it can achieve high performance, with an AUC of 0.899 and a Jaccard of 0.653, using only 18 trajectories of a previous semester to predict student pedagogical decisions in a later semester.

LGNov 6, 2025
FoodRL: A Reinforcement Learning Ensembling Framework For In-Kind Food Donation Forecasting

Esha Sharma, Lauren Davis, Julie Ivy et al.

Food banks are crucial for alleviating food insecurity, but their effectiveness hinges on accurately forecasting highly volatile in-kind donations to ensure equitable and efficient resource distribution. Traditional forecasting models often fail to maintain consistent accuracy due to unpredictable fluctuations and concept drift driven by seasonal variations and natural disasters such as hurricanes in the Southeastern U.S. and wildfires in the West Coast. To address these challenges, we propose FoodRL, a novel reinforcement learning (RL) based metalearning framework that clusters and dynamically weights diverse forecasting models based on recent performance and contextual information. Evaluated on multi-year data from two structurally distinct U.S. food banks-one large regional West Coast food bank affected by wildfires and another state-level East Coast food bank consistently impacted by hurricanes, FoodRL consistently outperforms baseline methods, particularly during periods of disruption or decline. By delivering more reliable and adaptive forecasts, FoodRL can facilitate the redistribution of food equivalent to 1.7 million additional meals annually, demonstrating its significant potential for social impact as well as adaptive ensemble learning for humanitarian supply chains.

LGFeb 25, 2025
Iterative Counterfactual Data Augmentation

Mitchell Plyler, Min Chi

Counterfactual data augmentation (CDA) is a method for controlling information or biases in training datasets by generating a complementary dataset with typically opposing biases. Prior work often either relies on hand-crafted rules or algorithmic CDA methods which can leave unwanted information in the augmented dataset. In this work, we show iterative CDA (ICDA) with initial, high-noise interventions can converge to a state with significantly lower noise. Our ICDA procedure produces a dataset where one target signal in the training dataset maintains high mutual information with a corresponding label and the information of spurious signals are reduced. We show training on the augmented datasets produces rationales on documents that better align with human annotation. Our experiments include six human produced datasets and two large-language model generated datasets.

3.7LGMar 31
Hierarchical Apprenticeship Learning from Imperfect Demonstrations with Evolving Rewards

Md Mirajul Islam, Rajesh Debnath, Adittya Soukarjya Saha et al.

While apprenticeship learning has shown promise for inducing effective pedagogical policies directly from student interactions in e-learning environments, most existing approaches rely on optimal or near-optimal expert demonstrations under a fixed reward. Real-world student interactions, however, are often inherently imperfect and evolving: students explore, make errors, revise strategies, and refine their goals as understanding develops. In this work, we argue that imperfect student demonstrations are not noise to be discarded, but structured signals-provided their relative quality is ranked. We introduce HALIDE, Hierarchical Apprenticeship Learning from Imperfect Demonstrations with Evolving Rewards, which not only leverages sub-optimal student demonstrations, but ranks them within a hierarchical learning framework. HALIDE models student behavior at multiple levels of abstraction, enabling inference of higher-level intent and strategy from suboptimal actions while explicitly capturing the temporal evolution of student reward functions. By integrating demonstration quality into hierarchical reward inference,HALIDE distinguishes transient errors from suboptimal strategies and meaningful progress toward higher-level learning goals. Our results show that HALIDE more accurately predicts student pedagogical decisions than approaches that rely on optimal trajectories, fixed rewards, or unranked imperfect demonstrations.

LGJun 26, 2025
Gradient-Based Neuroplastic Adaptation for Concurrent Optimization of Neuro-Fuzzy Networks

John Wesley Hostetter, Min Chi

Neuro-fuzzy networks (NFNs) are transparent, symbolic, and universal function approximations that perform as well as conventional neural architectures, but their knowledge is expressed as linguistic IF-THEN rules. Despite these advantages, their systematic design process remains a challenge. Existing work will often sequentially build NFNs by inefficiently isolating parametric and structural identification, leading to a premature commitment to brittle and subpar architecture. We propose a novel application-independent approach called gradient-based neuroplastic adaptation for the concurrent optimization of NFNs' parameters and structure. By recognizing that NFNs' parameters and structure should be optimized simultaneously as they are deeply conjoined, settings previously unapproachable for NFNs are now accessible, such as the online reinforcement learning of NFNs for vision-based tasks. The effectiveness of concurrently optimizing NFNs is empirically shown as it is trained by online reinforcement learning to proficiently play challenging scenarios from a vision-based video game called DOOM.

LGOct 26, 2024
Off-Policy Selection for Initiating Human-Centric Experimental Design

Ge Gao, Xi Yang, Qitong Gao et al.

In human-centric tasks such as healthcare and education, the heterogeneity among patients and students necessitates personalized treatments and instructional interventions. While reinforcement learning (RL) has been utilized in those tasks, off-policy selection (OPS) is pivotal to close the loop by offline evaluating and selecting policies without online interactions, yet current OPS methods often overlook the heterogeneity among participants. Our work is centered on resolving a pivotal challenge in human-centric systems (HCSs): how to select a policy to deploy when a new participant joining the cohort, without having access to any prior offline data collected over the participant? We introduce First-Glance Off-Policy Selection (FPS), a novel approach that systematically addresses participant heterogeneity through sub-group segmentation and tailored OPS criteria to each sub-group. By grouping individuals with similar traits, FPS facilitates personalized policy selection aligned with unique characteristics of each participant or group of participants. FPS is evaluated via two important but challenging applications, intelligent tutoring systems and a healthcare application for sepsis treatment and intervention. FPS presents significant advancement in enhancing learning outcomes of students and in-hospital care outcomes.

LGJun 4, 2024
A Generalized Apprenticeship Learning Framework for Modeling Heterogeneous Student Pedagogical Strategies

Md Mirajul Islam, Xi Yang, John Hostetter et al.

A key challenge in e-learning environments like Intelligent Tutoring Systems (ITSs) is to induce effective pedagogical policies efficiently. While Deep Reinforcement Learning (DRL) often suffers from sample inefficiency and reward function design difficulty, Apprenticeship Learning(AL) algorithms can overcome them. However, most AL algorithms can not handle heterogeneity as they assume all demonstrations are generated with a homogeneous policy driven by a single reward function. Still, some AL algorithms which consider heterogeneity, often can not generalize to large continuous state space and only work with discrete states. In this paper, we propose an expectation-maximization(EM)-EDM, a general AL framework to induce effective pedagogical policies from given optimal or near-optimal demonstrations, which are assumed to be driven by heterogeneous reward functions. We compare the effectiveness of the policies induced by our proposed EM-EDM against four AL-based baselines and two policies induced by DRL on two different but related tasks that involve pedagogical action prediction. Our overall results showed that, for both tasks, EM-EDM outperforms the four AL baselines across all performance metrics and the two DRL baselines. This suggests that EM-EDM can effectively model complex student pedagogical decision-making processes through the ability to manage a large, continuous state space and adapt to handle diverse and heterogeneous reward functions with very few given demonstrations.

LGMay 15, 2023
An Offline Time-aware Apprenticeship Learning Framework for Evolving Reward Functions

Xi Yang, Ge Gao, Min Chi

Apprenticeship learning (AL) is a process of inducing effective decision-making policies via observing and imitating experts' demonstrations. Most existing AL approaches, however, are not designed to cope with the evolving reward functions commonly found in human-centric tasks such as healthcare, where offline learning is required. In this paper, we propose an offline Time-aware Hierarchical EM Energy-based Sub-trajectory (THEMES) AL framework to tackle the evolving reward functions in such tasks. The effectiveness of THEMES is evaluated via a challenging task -- sepsis treatment. The experimental results demonstrate that THEMES can significantly outperform competitive state-of-the-art baselines.

CLJan 13, 2022
Making a (Counterfactual) Difference One Rationale at a Time

Mitchell Plyler, Michael Green, Min Chi

Rationales, snippets of extracted text that explain an inference, have emerged as a popular framework for interpretable natural language processing (NLP). Rationale models typically consist of two cooperating modules: a selector and a classifier with the goal of maximizing the mutual information (MMI) between the "selected" text and the document label. Despite their promises, MMI-based methods often pick up on spurious text patterns and result in models with nonsensical behaviors. In this work, we investigate whether counterfactual data augmentation (CDA), without human assistance, can improve the performance of the selector by lowering the mutual information between spurious signals and the document label. Our counterfactuals are produced in an unsupervised fashion using class-dependent generative models. From an information theoretic lens, we derive properties of the unaugmented dataset for which our CDA approach would succeed. The effectiveness of CDA is empirically evaluated by comparing against several baselines including an improved MMI-based rationale schema on two multi aspect datasets. Our results show that CDA produces rationales that better capture the signal of interest.

LGMay 6, 2021
Time-Aware Q-Networks: Resolving Temporal Irregularity for Deep Reinforcement Learning

Yeo Jin Kim, Min Chi

Deep Reinforcement Learning (DRL) has shown outstanding performance on inducing effective action policies that maximize expected long-term return on many complex tasks. Much of DRL work has been focused on sequences of events with discrete time steps and ignores the irregular time intervals between consecutive events. Given that in many real-world domains, data often consists of temporal sequences with irregular time intervals, and it is important to consider the time intervals between temporal events to capture latent progressive patterns of states. In this work, we present a general Time-Aware RL framework: Time-aware Q-Networks (TQN), which takes into account physical time intervals within a deep RL framework. TQN deals with time irregularity from two aspects: 1) elapsed time in the past and an expected next observation time for time-aware state approximation, and 2) action time window for the future for time-aware discounting of rewards. Experimental results show that by capturing the underlying structures in the sequences with time irregularities from both aspects, TQNs significantly outperform DQN in four types of contexts with irregular time intervals. More specifically, our results show that in classic RL tasks such as CartPole and MountainCar and Atari benchmark with randomly segmented time intervals, time-aware discounting alone is more important while in the real-world tasks such as nuclear reactor operation and septic patient treatment with intrinsic time intervals, both time-aware state and time-aware discounting are crucial. Moreover, to improve the agent's learning capacity, we explored three boosting methods: Double networks, Dueling networks, and Prioritized Experience Replay, and our results show that for the two real-world tasks, combining all three boosting methods with TQN is especially effective.

LGMay 2, 2021
InferNet for Delayed Reinforcement Tasks: Addressing the Temporal Credit Assignment Problem

Markel Sanz Ausin, Hamoon Azizsoltani, Song Ju et al.

The temporal Credit Assignment Problem (CAP) is a well-known and challenging task in AI. While Reinforcement Learning (RL), especially Deep RL, works well when immediate rewards are available, it can fail when only delayed rewards are available or when the reward function is noisy. In this work, we propose delegating the CAP to a Neural Network-based algorithm named InferNet that explicitly learns to infer the immediate rewards from the delayed rewards. The effectiveness of InferNet was evaluated on two online RL tasks: a simple GridWorld and 40 Atari games; and two offline RL tasks: GridWorld and a real-life Sepsis treatment task. For all tasks, the effectiveness of using the InferNet inferred rewards is compared against the immediate and the delayed rewards with two settings: with noisy rewards and without noise. Overall, our results show that the effectiveness of InferNet is robust against noisy reward functions and is an effective add-on mechanism for solving temporal CAP in a wide range of RL tasks, from classic RL simulation environments to a real-world RL problem and for both online and offline learning.

HCFeb 10, 2021
The Impact of Looking Further Ahead: A Comparison of Two Data-driven Unsolicited Hint Types on Performance in an Intelligent Data-driven Logic Tutor

Christa Cody, Mehak Maniktala, Nicholas Lytle et al.

Research has shown assistance can provide many benefits to novices lacking the mental models needed for problem solving in a new domain. However, varying approaches to assistance, such as subgoals and next-step hints, have been implemented with mixed results. Next-Step hints are common in data-driven tutors due to their straightforward generation from historical student data, as well as research showing positive impacts on student learning. However, there is a lack of research exploring the possibility of extending data-driven methods to provide higher-level assistance. Therefore, we modified our data-driven Next-Step hint generator to provide Waypoints, hints that are a few steps ahead, representing problem-solving subgoals. We hypothesized that Waypoints would benefit students with high prior knowledge, and that Next-Step hints would most benefit students with lower prior knowledge. In this study, we investigated the influence of data-driven hint type, Waypoints versus Next-Step hints, on student learning in a logic proof tutoring system, Deep Thought, in a discrete mathematics course. We found that Next-Step hints were more beneficial for the majority of students in terms of time, efficiency, and accuracy on the posttest. However, higher totals of successfully used Waypoints were correlated with improvements in efficiency and time in the posttest. These results suggest that Waypoint hints could be beneficial, but more scaffolding may be needed to help students follow them.

LGJan 15, 2021
TC-DTW: Accelerating Multivariate Dynamic Time Warping Through Triangle Inequality and Point Clustering

Daniel Shen, Min Chi

Dynamic time warping (DTW) plays an important role in analytics on time series. Despite the large body of research on speeding up univariate DTW, the method for multivariate DTW has not been improved much in the last two decades. The most popular algorithm used today is still the one developed seventeen years ago. This paper presents a solution that, as far as we know, for the first time consistently outperforms the classic multivariate DTW algorithm across dataset sizes, series lengths, data dimensions, temporal window sizes, and machines. The new solution, named TC-DTW, introduces Triangle Inequality and Point Clustering into the algorithm design on lower bound calculations for multivariate DTW. In experiments on DTW-based nearest neighbor finding, the new solution avoids as much as 98% (60% average) DTW distance calculations and yields as much as 25X (7.5X average) speedups.

LGOct 26, 2020
An Adversarial Domain Separation Framework for Septic Shock Early Prediction Across EHR Systems

Farzaneh Khoshnevisan, Min Chi

Modeling patient disease progression using Electronic Health Records (EHRs) is critical to assist clinical decision making. While most of prior work has mainly focused on developing effective disease progression models using EHRs collected from an individual medical system, relatively little work has investigated building robust yet generalizable diagnosis models across different systems. In this work, we propose a general domain adaptation (DA) framework that tackles two categories of discrepancies in EHRs collected from different medical systems: one is caused by heterogeneous patient populations (covariate shift) and the other is caused by variations in data collection procedures (systematic bias). Prior research in DA has mainly focused on addressing covariate shift but not systematic bias. In this work, we propose an adversarial domain separation framework that addresses both categories of discrepancies by maintaining one globally-shared invariant latent representation across all systems} through an adversarial learning process, while also allocating a domain-specific model for each system to extract local latent representations that cannot and should not be unified across systems. Moreover, our proposed framework is based on variational recurrent neural network (VRNN) because of its ability to capture complex temporal dependencies and handling missing values in time-series data. We evaluate our framework for early diagnosis of an extremely challenging condition, septic shock, using two real-world EHRs from distinct medical systems in the U.S. The results show that by separating globally-shared from domain-specific representations, our framework significantly improves septic shock early prediction performance in both EHRs and outperforms the current state-of-the-art DA models.

AIOct 8, 2020
Extending the Hint Factory for the assistance dilemma: A novel, data-driven HelpNeed Predictor for proactive problem-solving help

Mehak Maniktala, Christa Cody, Amy Isvik et al.

Determining when and whether to provide personalized support is a well-known challenge called the assistance dilemma. A core problem in solving the assistance dilemma is the need to discover when students are unproductive so that the tutor can intervene. Such a task is particularly challenging for open-ended domains, even those that are well-structured with defined principles and goals. In this paper, we present a set of data-driven methods to classify, predict, and prevent unproductive problem-solving steps in the well-structured open-ended domain of logic. This approach leverages and extends the Hint Factory, a set of methods that leverages prior student solution attempts to build data-driven intelligent tutors. We present a HelpNeed classification, that uses prior student data to determine when students are likely to be unproductive and need help learning optimal problem-solving strategies. We present a controlled study to determine the impact of an Adaptive pedagogical policy that provides proactive hints at the start of each step based on the outcomes of our HelpNeed predictor: productive vs. unproductive. Our results show that the students in the Adaptive condition exhibited better training behaviors, with lower help avoidance, and higher help appropriateness (a higher chance of receiving help when it was likely to be needed), as measured using the HelpNeed classifier, when compared to the Control. Furthermore, the results show that the students who received Adaptive hints based on HelpNeed predictions during training significantly outperform their Control peers on the posttest, with the former producing shorter, more optimal solutions in less time. We conclude with suggestions on how these HelpNeed methods could be applied in other well-structured open-ended domains.

AISep 28, 2020
Avoiding Help Avoidance: Using Interface Design Changes to Promote Unsolicited Hint Usage in an Intelligent Tutor

Mehak Maniktala, Christa Cody, Tiffany Barnes et al.

Within intelligent tutoring systems, considerable research has investigated hints, including how to generate data-driven hints, what hint content to present, and when to provide hints for optimal learning outcomes. However, less attention has been paid to how hints are presented. In this paper, we propose a new hint delivery mechanism called "Assertions" for providing unsolicited hints in a data-driven intelligent tutor. Assertions are partially-worked example steps designed to appear within a student workspace, and in the same format as student-derived steps, to show students a possible subgoal leading to the solution. We hypothesized that Assertions can help address the well-known hint avoidance problem. In systems that only provide hints upon request, hint avoidance results in students not receiving hints when they are needed. Our unsolicited Assertions do not seek to improve student help-seeking, but rather seek to ensure students receive the help they need. We contrast Assertions with Messages, text-based, unsolicited hints that appear after student inactivity. Our results show that Assertions significantly increase unsolicited hint usage compared to Messages. Further, they show a significant aptitude-treatment interaction between Assertions and prior proficiency, with Assertions leading students with low prior proficiency to generate shorter (more efficient) posttest solutions faster. We also present a clustering analysis that shows patterns of productive persistence among students with low prior knowledge when the tutor provides unsolicited help in the form of Assertions. Overall, this work provides encouraging evidence that hint presentation can significantly impact how students use them and using Assertions can be an effective way to address help avoidance.