A Bandit-Based Approach to Educational Recommender Systems: Contextual Thompson Sampling for Learner Skill Gain Optimization
This addresses the challenge of scaling personalized practice for learners in Operations Research, Management Science, and Analytics, though it is incremental as it applies an existing bandit method to a new educational context.
The paper tackled the problem of providing personalized exercise sequences in digital educational environments by introducing a method that selects exercises to maximize learner skill gain, finding that it recommends exercises associated with greater skill improvement and adapts effectively to individual differences.
In recent years, instructional practices in Operations Research (OR), Management Science (MS), and Analytics have increasingly shifted toward digital environments, where large and diverse groups of learners make it difficult to provide practice that adapts to individual needs. This paper introduces a method that generates personalized sequences of exercises by selecting, at each step, the exercise most likely to advance a learner's understanding of a targeted skill. The method uses information about the learner and their past performance to guide these choices, and learning progress is measured as the change in estimated skill level before and after each exercise. Using data from an online mathematics tutoring platform, we find that the approach recommends exercises associated with greater skill improvement and adapts effectively to differences across learners. From an instructional perspective, the framework enables personalized practice at scale, highlights exercises with consistently strong learning value, and helps instructors identify learners who may benefit from additional support.