Temporal Dropout Risk in Learning Analytics: A Harmonized Survival Benchmark Across Dynamic and Early-Window Representations
This work addresses the need for standardized benchmarks in Learning Analytics to improve model comparability and interpretability, though it is incremental as it builds on existing survival analysis methods.
The study tackled the problem of inconsistent evaluation in student dropout prediction by introducing a harmonized survival benchmark using the Open University Learning Analytics Dataset, finding that Random Survival Forest performed best in discrimination and Brier scores in one arm, while Poisson Piecewise-Exponential led narrowly in another, with key insights showing dropout risk is primarily temporal-behavioral rather than demographic.
Student dropout is a persistent concern in Learning Analytics, yet comparative studies frequently evaluate predictive models under heterogeneous protocols, prioritizing discrimination over temporal interpretability and calibration. This study introduces a survival-oriented benchmark for temporal dropout risk modelling using the Open University Learning Analytics Dataset (OULAD). Two harmonized arms are compared: a dynamic weekly arm, with models in person-period representation, and a comparable continuous-time arm, with an expanded roster of families -- tree-based survival, parametric, and neural models. The evaluation protocol integrates four analytical layers: predictive performance, ablation, explainability, and calibration. Results are reported within each arm separately, as a single cross-arm ranking is not methodologically warranted. Within the comparable arm, Random Survival Forest leads in discrimination and horizon-specific Brier scores; within the dynamic arm, Poisson Piecewise-Exponential leads narrowly on integrated Brier score within a tight five-family cluster. No-refit bootstrap sampling variability qualifies these positions as directional signals rather than absolute superiority. Ablation and explainability analyses converged, across all families, on a shared finding: the dominant predictive signal was not primarily demographic or structural, but temporal and behavioral. Calibration corroborated this pattern in the better-discriminating models, with the exception of XGBoost AFT, which exhibited systematic bias. These results support the value of a harmonized, multi-dimensional benchmark in Learning Analytics and situate dropout risk as a temporal-behavioral process rather than a function of static background attributes.