LGAIMLMar 4, 2019

Learning Modular Safe Policies in the Bandit Setting with Application to Adaptive Clinical Trials

arXiv:1903.01026v34 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of improving robustness in adaptive clinical trials for medical practitioners, though it is incremental as it builds on existing BESA methods.

The paper tackles the sensitivity of standard bandit algorithms to outlier data in small-sample settings like clinical trials by defining a new robustness criterion based on optimizing a function of the return distribution as regret. It proposes a modified BESA algorithm, provides a regret bound, and shows favorable empirical performance compared to standard algorithms on synthetic and clinical trial data.

The stochastic multi-armed bandit problem is a well-known model for studying the exploration-exploitation trade-off. It has significant possible applications in adaptive clinical trials, which allow for dynamic changes in the treatment allocation probabilities of patients. However, most bandit learning algorithms are designed with the goal of minimizing the expected regret. While this approach is useful in many areas, in clinical trials, it can be sensitive to outlier data, especially when the sample size is small. In this paper, we define and study a new robustness criterion for bandit problems. Specifically, we consider optimizing a function of the distribution of returns as a regret measure. This provides practitioners more flexibility to define an appropriate regret measure. The learning algorithm we propose to solve this type of problem is a modification of the BESA algorithm [Baransi et al., 2014], which considers a more general version of regret. We present a regret bound for our approach and evaluate it empirically both on synthetic problems as well as on a dataset from the clinical trial literature. Our approach compares favorably to a suite of standard bandit algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes