Sofia S. Villar

LG
h-index11
7papers
80citations
Novelty49%
AI Score35

7 Papers

LGMay 18, 2022
Multi-disciplinary fairness considerations in machine learning for clinical trials

Isabel Chien, Nina Deliu, Richard E. Turner et al. · cambridge

While interest in the application of machine learning to improve healthcare has grown tremendously in recent years, a number of barriers prevent deployment in medical practice. A notable concern is the potential to exacerbate entrenched biases and existing health disparities in society. The area of fairness in machine learning seeks to address these issues of equity; however, appropriate approaches are context-dependent, necessitating domain-specific consideration. We focus on clinical trials, i.e., research studies conducted on humans to evaluate medical treatments. Clinical trials are a relatively under-explored application in machine learning for healthcare, in part due to complex ethical, legal, and regulatory requirements and high costs. Our aim is to provide a multi-disciplinary assessment of how fairness for machine learning fits into the context of clinical trials research and practice. We start by reviewing the current ethical considerations and guidelines for clinical trials and examine their relationship with common definitions of fairness in machine learning. We examine potential sources of unfairness in clinical trials, providing concrete examples, and discuss the role machine learning might play in either mitigating potential biases or exacerbating them when applied without care. Particular focus is given to adaptive clinical trials, which may employ machine learning. Finally, we highlight concepts that require further investigation and development, and emphasize new approaches to fairness that may be relevant to the design of clinical trials.

MLMay 8, 2022
Some performance considerations when using multi-armed bandit algorithms in the presence of missing data

Xijin Chen, Kim May Lee, Sofia S. Villar et al.

When comparing the performance of multi-armed bandit algorithms, the potential impact of missing data is often overlooked. In practice, it also affects their implementation where the simplest approach to overcome this is to continue to sample according to the original bandit algorithm, ignoring missing outcomes. We investigate the impact on performance of this approach to deal with missing data for several bandit algorithms through an extensive simulation study assuming the rewards are missing at random. We focus on two-armed bandit algorithms with binary outcomes in the context of patient allocation for clinical trials with relatively small sample sizes. However, our results apply to other applications of bandit algorithms where missing data is expected to occur. We assess the resulting operating characteristics, including the expected reward. Different probabilities of missingness in both arms are considered. The key finding of our work is that when using the simplest strategy of ignoring missing data, the impact on the expected performance of multi-armed bandit strategies varies according to the way these strategies balance the exploration-exploitation trade-off. Algorithms that are geared towards exploration continue to assign samples to the arm with more missing responses (which being perceived as the arm with less observed information is deemed more appealing by the algorithm than it would otherwise be). In contrast, algorithms that are geared towards exploitation would rapidly assign a high value to samples from the arms with a current high mean irrespective of the level observations per arm. Furthermore, for algorithms focusing more on exploration, we illustrate that the problem of missing responses can be alleviated using a simple mean imputation approach.

LGAug 12, 2025
A Personalized Exercise Assistant using Reinforcement Learning (PEARL): Results from a four-arm Randomized-controlled Trial

Amy Armento Lee, Narayan Hegde, Nina Deliu et al.

Consistent physical inactivity poses a major global health challenge. Mobile health (mHealth) interventions, particularly Just-in-Time Adaptive Interventions (JITAIs), offer a promising avenue for scalable, personalized physical activity (PA) promotion. However, developing and evaluating such interventions at scale, while integrating robust behavioral science, presents methodological hurdles. The PEARL study was the first large-scale, four-arm randomized controlled trial to assess a reinforcement learning (RL) algorithm, informed by health behavior change theory, to personalize the content and timing of PA nudges via a Fitbit app. We enrolled and randomized 13,463 Fitbit users into four study arms: control, random, fixed, and RL. The control arm received no nudges. The other three arms received nudges from a bank of 155 nudges based on behavioral science principles. The random arm received nudges selected at random. The fixed arm received nudges based on a pre-set logic from survey responses about PA barriers. The RL group received nudges selected by an adaptive RL algorithm. We included 7,711 participants in primary analyses (mean age 42.1, 86.3% female, baseline steps 5,618.2). We observed an increase in PA for the RL group compared to all other groups from baseline to 1 and 2 months. The RL group had significantly increased average daily step count at 1 month compared to all other groups: control (+296 steps, p=0.0002), random (+218 steps, p=0.005), and fixed (+238 steps, p=0.002). At 2 months, the RL group sustained a significant increase compared to the control group (+210 steps, p=0.0122). Generalized estimating equation models also revealed a sustained increase in daily steps in the RL group vs. control (+208 steps, p=0.002). These findings demonstrate the potential of a scalable, behaviorally-informed RL approach to personalize digital health interventions for PA.

LGDec 15, 2021
Algorithms for Adaptive Experiments that Trade-off Statistical Analysis with Reward: Combining Uniform Random Assignment and Reward Maximization

Tong Li, Jacob Nogas, Haochen Song et al.

Traditional randomized A/B experiments assign arms with uniform random (UR) probability, such as 50/50 assignment to two versions of a website to discover whether one version engages users more. To more quickly and automatically use data to benefit users, multi-armed bandit algorithms such as Thompson Sampling (TS) have been advocated. While TS is interpretable and incorporates the randomization key to statistical inference, it can cause biased estimates and increase false positives and false negatives in detecting differences in arm means. We introduce a more Statistically Sensitive algorithm, TS-PostDiff (Posterior Probability of Small Difference), that mixes TS with traditional UR by using an additional adaptive step, where the probability of using UR (vs TS) is proportional to the posterior probability that the difference in arms is small. This allows an experimenter to define what counts as a small difference, below which a traditional UR experiment can obtain informative data for statistical inference at low cost, and above which using more TS to maximize user benefits is key. We evaluate TS-PostDiff against UR, TS, and two other TS variants designed to improve statistical inference. We consider results for the common two-armed experiment across a range of settings inspired by real-world applications. Our results provide insight into when and why TS-PostDiff or alternative approaches provide better tradeoffs between benefiting users (reward) and statistical inference (false positive rate and power). TS-PostDiff's adaptivity helps efficiently reduce false positives and increase statistical power when differences are small, while increasing reward more when differences are large. The work highlights important considerations for future Statistically Sensitive algorithm development that balances reward and statistical analysis in adaptive experimentation.

MLOct 30, 2021
Efficient Inference Without Trading-off Regret in Bandits: An Allocation Probability Test for Thompson Sampling

Nina Deliu, Joseph J. Williams, Sofia S. Villar

Using bandit algorithms to conduct adaptive randomised experiments can minimise regret, but it poses major challenges for statistical inference (e.g., biased estimators, inflated type-I error and reduced power). Recent attempts to address these challenges typically impose restrictions on the exploitative nature of the bandit algorithm$-$trading off regret$-$and require large sample sizes to ensure asymptotic guarantees. However, large experiments generally follow a successful pilot study, which is tightly constrained in its size or duration. Increasing power in such small pilot experiments, without limiting the adaptive nature of the algorithm, can allow promising interventions to reach a larger experimental phase. In this work we introduce a novel hypothesis test, uniquely based on the allocation probabilities of the bandit algorithm, and without constraining its exploitative nature or requiring a minimum experimental size. We characterise our $Allocation\ Probability\ Test$ when applied to $Thompson\ Sampling$, presenting its asymptotic theoretical properties, and illustrating its finite-sample performances compared to state-of-the-art approaches. We demonstrate the regret and inferential advantages of our approach, particularly in small samples, in both extensive simulations and in a real-world experiment on mental health aspects.

LGMar 22, 2021
Challenges in Statistical Analysis of Data Collected by a Bandit Algorithm: An Empirical Exploration in Applications to Adaptively Randomized Experiments

Joseph Jay Williams, Jacob Nogas, Nina Deliu et al.

Multi-armed bandit algorithms have been argued for decades as useful for adaptively randomized experiments. In such experiments, an algorithm varies which arms (e.g. alternative interventions to help students learn) are assigned to participants, with the goal of assigning higher-reward arms to as many participants as possible. We applied the bandit algorithm Thompson Sampling (TS) to run adaptive experiments in three university classes. Instructors saw great value in trying to rapidly use data to give their students in the experiments better arms (e.g. better explanations of a concept). Our deployment, however, illustrated a major barrier for scientists and practitioners to use such adaptive experiments: a lack of quantifiable insight into how much statistical analysis of specific real-world experiments is impacted (Pallmann et al, 2018; FDA, 2019), compared to traditional uniform random assignment. We therefore use our case study of the ubiquitous two-arm binary reward setting to empirically investigate the impact of using Thompson Sampling instead of uniform random assignment. In this setting, using common statistical hypothesis tests, we show that collecting data with TS can as much as double the False Positive Rate (FPR; incorrectly reporting differences when none exist) and the False Negative Rate (FNR; failing to report differences when they exist)...

LGJun 9, 2020
Learning for Dose Allocation in Adaptive Clinical Trials with Safety Constraints

Cong Shen, Zhiyang Wang, Sofia S. Villar et al.

Phase I dose-finding trials are increasingly challenging as the relationship between efficacy and toxicity of new compounds (or combination of them) becomes more complex. Despite this, most commonly used methods in practice focus on identifying a Maximum Tolerated Dose (MTD) by learning only from toxicity events. We present a novel adaptive clinical trial methodology, called Safe Efficacy Exploration Dose Allocation (SEEDA), that aims at maximizing the cumulative efficacies while satisfying the toxicity safety constraint with high probability. We evaluate performance objectives that have operational meanings in practical clinical trials, including cumulative efficacy, recommendation/allocation success probabilities, toxicity violation probability, and sample efficiency. An extended SEEDA-Plateau algorithm that is tailored for the increase-then-plateau efficacy behavior of molecularly targeted agents (MTA) is also presented. Through numerical experiments using both synthetic and real-world datasets, we show that SEEDA outperforms state-of-the-art clinical trial designs by finding the optimal dose with higher success rate and fewer patients.