Thomas Trask

CY
h-index2
4papers
4citations
Novelty20%
AI Score30

4 Papers

LGJun 2
When Offline Selectors Cannot Beat the Best Single Model: A Diagnostic Study on edX Dropout Prediction

Tyler Crosse, Alan Nadelsticher Ruvalcaba, Dustin Khang LeDuc et al.

Different predictors often excel on different inputs, so picking the best one per instance promises higher accuracy than committing to a single model. In practice, selectors trained from logged data routinely fail to beat the strongest single predictor. Three causes typically go unseparated before more tuning is applied: a mismatched learner, a state that does not predict which model wins, or buffer-to-deployment label shift. A three-stage diagnostic rules them out on a shared buffer. Stage~1 estimates a local ceiling on oracle recovery from $k$-NN label consistency. Stage~2 asks whether paired BC and offline-RL learners (BC, DQN, and CQL across penalty weights) reach that ceiling. Stage~3 ablates the selector state to test whether richer features would raise it. The combined verdict points to the most promising next step: tuning the learner, redesigning the state, or collecting new data. We apply it to selecting among five dropout-prediction models on edX clickstream data. Across 16 windows, the oracle beats the strongest single base model by 9.7 accuracy points on average, yet BC, DQN, and CQL land in the same test-accuracy band below it (robust to a tenfold buffer sweep and $N{=}2{,}000$ held-out examples). The bottleneck is local representational ambiguity: CQL closes the imitation gap without a deployment gain (not conservatism), regret clusters tightly across learners (not tie-breaking), and the three learners converge on test accuracy (not shift). The next iteration should change the state or collect new data, not tune the offline learner further.

LGAug 15, 2023
Analyzing the Capabilities of Nature-inspired Feature Selection Algorithms in Predicting Student Performance

Thomas Trask

Predicting student performance is key in leveraging effective pre-failure interventions for at-risk students. As educational data grows larger, more effective means of analyzing student data in a timely manner are needed in order to provide useful predictions and interventions. In this paper, an analysis was conducted to determine the relative performance of a suite of nature-inspired algorithms in the feature-selection portion of ensemble algorithms used to predict student performance. A Swarm Intelligence ML engine (SIMLe) was developed to run this suite in tandem with a series of traditional ML classification algorithms to analyze three student datasets: instance-based clickstream data, hybrid single-course performance, and student meta-performance when taking multiple courses simultaneously. These results were then compared to previous predictive algorithms and, for all datasets analyzed, it was found that leveraging an ensemble approach using nature-inspired algorithms for feature selection and traditional ML algorithms for classification significantly increased predictive accuracy while also reducing feature set size by up to 65 percent.

CYMay 19, 2024
A Comparative Analysis of Student Performance Predictions in Online Courses using Heterogeneous Knowledge Graphs

Thomas Trask, Nicholas Lytle, Michael Boyle et al.

As online courses become the norm in the higher-education landscape, investigations into student performance between students who take online vs on-campus versions of classes become necessary. While attention has been given to looking at differences in learning outcomes through comparisons of students' end performance, less attention has been given in comparing students' engagement patterns between different modalities. In this study, we analyze a heterogeneous knowledge graph consisting of students, course videos, formative assessments and their interactions to predict student performance via a Graph Convolutional Network (GCN). Using students' performance on the assessments, we attempt to determine a useful model for identifying at-risk students. We then compare the models generated between 5 on-campus and 2 fully-online MOOC-style instances of the same course. The model developed achieved a 70-90\% accuracy of predicting whether a student would pass a particular problem set based on content consumed, course instance, and modality.

CYApr 20, 2024
Ordinal Behavior Classification of Student Online Course Interactions

Thomas Trask

The study in interaction patterns between students in on-campus and MOOC-style online courses has been broadly studied for the last 11 years. Yet there remains a gap in the literature comparing the habits of students completing the same course offered in both on-campus and MOOC-style online formats. This study will look at browser-based usage patterns for students in the Georgia Tech CS1301 edx course for both the online course offered to on-campus students and the MOOCstyle course offered to anyone to determine what, if any, patterns exist between the two cohorts.