Richard Weiss

LG
4papers
15citations
Novelty31%
AI Score40

4 Papers

69.6LGMay 31
Interaction-Limited Safe Continuous-Time RL for Dynamical Medical Treatment

Xun Shen, Yuepeng Wang, Akifumi Wachi et al.

Dynamic medical treatment requires deciding treatment intensity and intervention timing, while patient states evolve continuously and adverse events may occur between clinical interactions. Most existing treatment learning methods assume fixed schedules or enforce safety only at discrete decision points. We propose Interaction-Limited Safe Continuous-Time Reinforcement Learning, a framework that jointly optimizes treatment administration and clinical interaction timing under trajectory-level safety constraints. Our key idea is to reformulate the continuous time treatment problem as an option-based semi-Markov decision process, where each option specifies a continuous-time treatment policy and its duration. We develop a safety-tightening mechanism showing that suitably constructed constraints at interaction times guarantee safety over the full continuous-time trajectory with high probability. We further establish finite-sample guarantees for policy learning from logged treatment trajectories and introduce a practical data-driven conservative surrogate. Experiments show that the proposed adaptive interaction-timing mechanism improves both safety and treatment effectiveness over equidistant interaction schemes across different safe policy optimization methods.

36.0LGMay 31
MedGym:A Unified Continuous-Time Benchmark for Dynamic Medical Treatment Reinforcement Learning

Yuepeng Wang, Ken Kawano, Yongqi Zhou et al.

Medical treatment recommendation poses several challenges to reinforcement learning (RL): patient physiology evolves in continuous time, measurements and interventions are performed at irregular intervals, and treatment effects vary substantially across individuals. Existing RL formulations and simulated environments, however, are based on discrete-time MDP or POMDP abstractions with fixed or pre-specified decision intervals. Thus, it remains difficult to evaluate whether RL methods can handle time-interval-dependent disease progression, personalized treatment response, and safety between consecutive measurement points. To address this gap, we introduce MedGym, a benchmark environment for dynamic treatment recommendation. MedGym models longitudinal patient evolution in a continuous-time framework and constructs a configurable medical RL benchmark from clinical data by using Physics-Informed Neural Networks. The resulting benchmark supports both offline and online RL, and enables direct comparison between discrete-time and continuous-time methods under irregular treatment timing and patient-specific dynamics. Besides, MedGym supports evaluation from clinically important perspectives, including personalization, trajectory-level safety, and the performance gap between model-based offline learning and online deployment. By providing a standardized and configurable benchmark for continuous-time dynamic treatment, MedGym aims to facilitate more realistic and informative evaluation of medical RL methods.

LGAug 16, 2024
Detecting Unsuccessful Students in Cybersecurity Exercises in Two Different Learning Environments

Valdemar Švábenský, Kristián Tkáčik, Aubrey Birdwell et al.

This full paper in the research track evaluates the usage of data logged from cybersecurity exercises in order to predict students who are potentially at risk of performing poorly. Hands-on exercises are essential for learning since they enable students to practice their skills. In cybersecurity, hands-on exercises are often complex and require knowledge of many topics. Therefore, students may miss solutions due to gaps in their knowledge and become frustrated, which impedes their learning. Targeted aid by the instructor helps, but since the instructor's time is limited, efficient ways to detect struggling students are needed. This paper develops automated tools to predict when a student is having difficulty. We formed a dataset with the actions of 313 students from two countries and two learning environments: KYPO CRP and EDURange. These data are used in machine learning algorithms to predict the success of students in exercises deployed in these environments. After extracting features from the data, we trained and cross-validated eight classifiers for predicting the exercise outcome and evaluated their predictive power. The contribution of this paper is comparing two approaches to feature engineering, modeling, and classification performance on data from two learning environments. Using the features from either learning environment, we were able to detect and distinguish between successful and struggling students. A decision tree classifier achieved the highest balanced accuracy and sensitivity with data from both learning environments. The results show that activity data from cybersecurity exercises are suitable for predicting student success. In a potential application, such models can aid instructors in detecting struggling students and providing targeted help. We publish data and code for building these models so that others can adopt or adapt them.

CYDec 3, 2021Code
Evaluating Two Approaches to Assessing Student Progress in Cybersecurity Exercises

Valdemar Švábenský, Richard Weiss, Jack Cook et al.

Cybersecurity students need to develop practical skills such as using command-line tools. Hands-on exercises are the most direct way to assess these skills, but assessing students' mastery is a challenging task for instructors. We aim to alleviate this issue by modeling and visualizing student progress automatically throughout the exercise. The progress is summarized by graph models based on the shell commands students typed to achieve discrete tasks within the exercise. We implemented two types of models and compared them using data from 46 students at two universities. To evaluate our models, we surveyed 22 experienced computing instructors and qualitatively analyzed their responses. The majority of instructors interpreted the graph models effectively and identified strengths, weaknesses, and assessment use cases for each model. Based on the evaluation, we provide recommendations to instructors and explain how our graph models innovate teaching and promote further research. The impact of this paper is threefold. First, it demonstrates how multiple institutions can collaborate to share approaches to modeling student progress in hands-on exercises. Second, our modeling techniques generalize to data from different environments to support student assessment, even outside the cybersecurity domain. Third, we share the acquired data and open-source software so that others can use the models in their classes or research.