Esmaeil Keyvanshokooh

LG
h-index16
4papers
8citations
Novelty53%
AI Score31

4 Papers

LGOct 18, 2024
HR-Bandit: Human-AI Collaborated Linear Recourse Bandit

Junyu Cao, Ruijiang Gao, Esmaeil Keyvanshokooh

Human doctors frequently recommend actionable recourses that allow patients to modify their conditions to access more effective treatments. Inspired by such healthcare scenarios, we propose the Recourse Linear UCB ($\textsf{RLinUCB}$) algorithm, which optimizes both action selection and feature modifications by balancing exploration and exploitation. We further extend this to the Human-AI Linear Recourse Bandit ($\textsf{HR-Bandit}$), which integrates human expertise to enhance performance. $\textsf{HR-Bandit}$ offers three key guarantees: (i) a warm-start guarantee for improved initial performance, (ii) a human-effort guarantee to minimize required human interactions, and (iii) a robustness guarantee that ensures sublinear regret even when human decisions are suboptimal. Empirical results, including a healthcare case study, validate its superior performance against existing benchmarks.

LGFeb 3, 2024
Online Uniform Sampling: Randomized Learning-Augmented Approximation Algorithms with Application to Digital Health

Xueqing Liu, Kyra Gan, Esmaeil Keyvanshokooh et al.

Motivated by applications in digital health, this work studies the novel problem of online uniform sampling (OUS), where the goal is to distribute a sampling budget uniformly across unknown decision times. In the OUS problem, the algorithm is given a budget $b$ and a time horizon $T$, and an adversary then chooses a value $τ^* \in [b,T]$, which is revealed to the algorithm online. At each decision time $i \in [τ^*]$, the algorithm must determine a sampling probability that maximizes the budget spent throughout the horizon, respecting budget constraint $b$, while achieving as uniform a distribution as possible over $τ^*$. We present the first randomized algorithm designed for this problem and subsequently extend it to incorporate learning augmentation. We provide worst-case approximation guarantees for both algorithms, and illustrate the utility of the algorithms through both synthetic experiments and a real-world case study involving the HeartSteps mobile application. Our numerical results show strong empirical average performance of our proposed randomized algorithms against previously proposed heuristic solutions.

MLMay 22, 2025
Deconfounded Warm-Start Thompson Sampling with Applications to Precision Medicine

Prateek Jaiswal, Esmaeil Keyvanshokooh, Junyu Cao

Randomized clinical trials often require large patient cohorts before drawing definitive conclusions, yet abundant observational data from parallel studies remains underutilized due to confounding and hidden biases. To bridge this gap, we propose Deconfounded Warm-Start Thompson Sampling (DWTS), a practical approach that leverages a Doubly Debiased LASSO (DDL) procedure to identify a sparse set of reliable measured covariates and combines them with key hidden covariates to form a reduced context. By initializing Thompson Sampling (LinTS) priors with DDL-estimated means and variances on these measured features -- while keeping uninformative priors on hidden features -- DWTS effectively harnesses confounded observational data to kick-start adaptive clinical trials. Evaluated on both a purely synthetic environment and a virtual environment created using real cardiovascular risk dataset, DWTS consistently achieves lower cumulative regret than standard LinTS, showing how offline causal insights from observational data can improve trial efficiency and support more personalized treatment decisions.

LGMay 29, 2023
Contextual Bandits with Budgeted Information Reveal

Kyra Gan, Esmaeil Keyvanshokooh, Xueqing Liu et al.

Contextual bandit algorithms are commonly used in digital health to recommend personalized treatments. However, to ensure the effectiveness of the treatments, patients are often requested to take actions that have no immediate benefit to them, which we refer to as pro-treatment actions. In practice, clinicians have a limited budget to encourage patients to take these actions and collect additional information. We introduce a novel optimization and learning algorithm to address this problem. This algorithm effectively combines the strengths of two algorithmic approaches in a seamless manner, including 1) an online primal-dual algorithm for deciding the optimal timing to reach out to patients, and 2) a contextual bandit learning algorithm to deliver personalized treatment to the patient. We prove that this algorithm admits a sub-linear regret bound. We illustrate the usefulness of this algorithm on both synthetic and real-world data.