MLNov 25, 2022
Doubly robust nearest neighbors in factor modelsRaaz Dwivedi, Katherine Tian, Sabina Tomkins et al. · harvard, mit
We introduce and analyze an improved variant of nearest neighbors (NN) for estimation with missing data in latent factor models. We consider a matrix completion problem with missing data, where the $(i, t)$-th entry, when observed, is given by its mean $f(u_i, v_t)$ plus mean-zero noise for an unknown function $f$ and latent factors $u_i$ and $v_t$. Prior NN strategies, like unit-unit NN, for estimating the mean $f(u_i, v_t)$ relies on existence of other rows $j$ with $u_j \approx u_i$. Similarly, time-time NN strategy relies on existence of columns $t'$ with $v_{t'} \approx v_t$. These strategies provide poor performance respectively when similar rows or similar columns are not available. Our estimate is doubly robust to this deficit in two ways: (1) As long as there exist either good row or good column neighbors, our estimate provides a consistent estimate. (2) Furthermore, if both good row and good column neighbors exist, it provides a (near-)quadratic improvement in the non-asymptotic error and admits a significantly narrower asymptotic confidence interval when compared to both unit-unit or time-time NN.
LGApr 11, 2023
Did we personalize? Assessing personalization by an online reinforcement learning algorithm using resamplingSusobhan Ghosh, Raphael Kim, Prasidh Chhabria et al. · harvard, mit
There is a growing interest in using reinforcement learning (RL) to personalize sequences of treatments in digital health to support users in adopting healthier behaviors. Such sequential decision-making problems involve decisions about when to treat and how to treat based on the user's context (e.g., prior activity level, location, etc.). Online RL is a promising data-driven approach for this problem as it learns based on each user's historical responses and uses that knowledge to personalize these decisions. However, to decide whether the RL algorithm should be included in an ``optimized'' intervention for real-world deployment, we must assess the data evidence indicating that the RL algorithm is actually personalizing the treatments to its users. Due to the stochasticity in the RL algorithm, one may get a false impression that it is learning in certain states and using this learning to provide specific treatments. We use a working definition of personalization and introduce a resampling-based methodology for investigating whether the personalization exhibited by the RL algorithm is an artifact of the RL algorithm stochasticity. We illustrate our methodology with a case study by analyzing the data from a physical activity clinical trial called HeartSteps, which included the use of an online RL algorithm. We demonstrate how our approach enhances data-driven truth-in-advertising of algorithm personalization both across all users as well as within specific users in the study.
MLJun 19, 2023
Effect-Invariant Mechanisms for Policy GeneralizationSorawit Saengkyongam, Niklas Pfister, Predrag Klasnja et al.
Policy learning is an important component of many real-world learning systems. A major challenge in policy learning is how to adapt efficiently to unseen environments or tasks. Recently, it has been suggested to exploit invariant conditional distributions to learn models that generalize better to unseen environments. However, assuming invariance of entire conditional distributions (which we call full invariance) may be too strong of an assumption in practice. In this paper, we introduce a relaxation of full invariance called effect-invariance (e-invariance for short) and prove that it is sufficient, under suitable assumptions, for zero-shot policy generalization. We also discuss an extension that exploits e-invariance when we have a small sample from the test environment, enabling few-shot policy generalization. Our work does not assume an underlying causal graph or that the data are generated by a structural causal model; instead, we develop testing procedures to test e-invariance directly from data. We present empirical results using simulated data and a mobile health intervention dataset to demonstrate the effectiveness of our approach.
HCMar 12, 2024
AI-Assisted Causal Pathway Diagram for Human-Centered DesignRuican Zhong, Donghoon Shin, Rosemary Meza et al. · uw
This paper explores the integration of causal pathway diagrams (CPD) into human-centered design (HCD), investigating how these diagrams can enhance the early stages of the design process. A dedicated CPD plugin for the online collaborative whiteboard platform Miro was developed to streamline diagram creation and offer real-time AI-driven guidance. Through a user study with designers (N=20), we found that CPD's branching and its emphasis on causal connections supported both divergent and convergent processes during design. CPD can also facilitate communication among stakeholders. Additionally, we found our plugin significantly reduces designers' cognitive workload and increases their creativity during brainstorming, highlighting the implications of AI-assisted tools in supporting creative work and evidence-based designs.
LGOct 18, 2024
Harnessing Causality in Reinforcement Learning With Bagged Decision TimesDaiqi Gao, Hsin-Yu Lai, Predrag Klasnja et al.
We consider reinforcement learning (RL) for a class of problems with bagged decision times. A bag contains a finite sequence of consecutive decision times. The transition dynamics are non-Markovian and non-stationary within a bag. All actions within a bag jointly impact a single reward, observed at the end of the bag. For example, in mobile health, multiple activity suggestions in a day collectively affect a user's daily commitment to being active. Our goal is to develop an online RL algorithm to maximize the discounted sum of the bag-specific rewards. To handle non-Markovian transitions within a bag, we utilize an expert-provided causal directed acyclic graph (DAG). Based on the DAG, we construct states as a dynamical Bayesian sufficient statistic of the observed history, which results in Markov state transitions within and across bags. We then formulate this problem as a periodic Markov decision process (MDP) that allows non-stationarity within a period. An online RL algorithm based on Bellman equations for stationary MDPs is generalized to handle periodic MDPs. We show that our constructed state achieves the maximal optimal value function among all state constructions for a periodic MDP. Finally, we evaluate the proposed method on testbed variants built from real data in a mobile health clinical trial.
LGOct 16, 2025
Active Measuring in Reinforcement Learning With Delayed Negative EffectsDaiqi Gao, Ziping Xu, Aseel Rawashdeh et al.
Measuring states in reinforcement learning (RL) can be costly in real-world settings and may negatively influence future outcomes. We introduce the Actively Observable Markov Decision Process (AOMDP), where an agent not only selects control actions but also decides whether to measure the latent state. The measurement action reveals the true latent state but may have a negative delayed effect on the environment. We show that this reduced uncertainty may provably improve sample efficiency and increase the value of the optimal policy despite these costs. We formulate an AOMDP as a periodic partially observable MDP and propose an online RL algorithm based on belief states. To approximate the belief states, we further propose a sequential Monte Carlo method to jointly approximate the posterior of unknown static environment parameters and unobserved latent states. We evaluate the proposed algorithm in a digital health application, where the agent decides when to deliver digital interventions and when to assess users' health status through surveys.
LGMay 17, 2023
Assessing the Impact of Context Inference Error and Partial Observability on RL Methods for Just-In-Time Adaptive InterventionsKarine Karine, Predrag Klasnja, Susan A. Murphy et al.
Just-in-Time Adaptive Interventions (JITAIs) are a class of personalized health interventions developed within the behavioral science community. JITAIs aim to provide the right type and amount of support by iteratively selecting a sequence of intervention options from a pre-defined set of components in response to each individual's time varying state. In this work, we explore the application of reinforcement learning methods to the problem of learning intervention option selection policies. We study the effect of context inference error and partial observability on the ability to learn effective policies. Our results show that the propagation of uncertainty from context inferences is critical to improving intervention efficacy as context uncertainty increases, while policy gradient algorithms can provide remarkable robustness to partially observed behavioral state information.
MLFeb 14, 2022
Counterfactual inference in sequential experimentsRaaz Dwivedi, Katherine Tian, Sabina Tomkins et al.
We consider after-study statistical inference for sequentially designed experiments wherein multiple units are assigned treatments for multiple time points using treatment policies that adapt over time. Our goal is to provide inference guarantees for the counterfactual mean at the smallest possible scale -- mean outcome under different treatments for each unit and each time -- with minimal assumptions on the adaptive treatment policy. Without any structural assumptions on the counterfactual means, this challenging task is infeasible due to more unknowns than observed data points. To make progress, we introduce a latent factor model over the counterfactual means that serves as a non-parametric generalization of the non-linear mixed effects model and the bilinear latent factor model considered in prior works. For estimation, we use a non-parametric method, namely a variant of nearest neighbors, and establish a non-asymptotic high probability error bound for the counterfactual mean for each unit and each time. Under regularity conditions, this bound leads to asymptotically valid confidence intervals for the counterfactual mean as the number of units and time points grows to $\infty$ together at suitable rates. We illustrate our theory via several simulations and a case study involving data from a mobile health clinical trial HeartSteps.
LGJul 31, 2020
IntelligentPooling: Practical Thompson Sampling for mHealthSabina Tomkins, Peng Liao, Predrag Klasnja et al.
In mobile health (mHealth) smart devices deliver behavioral treatments repeatedly over time to a user with the goal of helping the user adopt and maintain healthy behaviors. Reinforcement learning appears ideal for learning how to optimally make these sequential treatment decisions. However, significant challenges must be overcome before reinforcement learning can be effectively deployed in a mobile healthcare setting. In this work we are concerned with the following challenges: 1) individuals who are in the same context can exhibit differential response to treatments 2) only a limited amount of data is available for learning on any one individual, and 3) non-stationary responses to treatment. To address these challenges we generalize Thompson-Sampling bandit algorithms to develop IntelligentPooling. IntelligentPooling learns personalized treatment policies thus addressing challenge one. To address the second challenge, IntelligentPooling updates each user's degree of personalization while making use of available data on other users to speed up learning. Lastly, IntelligentPooling allows responsivity to vary as a function of a user's time since beginning treatment, thus addressing challenge three. We show that IntelligentPooling achieves an average of 26% lower regret than state-of-the-art. We demonstrate the promise of this approach and its ability to learn from even a small group of users in a live clinical trial.
STJul 23, 2020
Batch Policy Learning in Average Reward Markov Decision ProcessesPeng Liao, Zhengling Qi, Runzhe Wan et al.
We consider the batch (off-line) policy learning problem in the infinite horizon Markov Decision Process. Motivated by mobile health applications, we focus on learning a policy that maximizes the long-term average reward. We propose a doubly robust estimator for the average reward and show that it achieves semiparametric efficiency. Further we develop an optimization algorithm to compute the optimal policy in a parameterized stochastic policy class. The performance of the estimated policy is measured by the difference between the optimal average reward in the policy class and the average reward of the estimated policy and we establish a finite-sample regret guarantee. The performance of the method is illustrated by simulation studies and an analysis of a mobile health study promoting physical activity.
HCApr 23, 2020
The Micro-Randomized Trial for Developing Digital Interventions: Experimental Design ConsiderationsAshley E. Walton, Linda M. Collins, Predrag Klasnja et al.
Just-in-time adaptive interventions (JITAIs) are time-varying adaptive interventions that use frequent opportunities for the intervention to be adapted such as weekly, daily, or even many times a day. This high intensity of adaptation is facilitated by the ability of digital technology to continuously collect information about an individual's current context and deliver treatments adapted to this information. The micro-randomized trial (MRT) has emerged for use in informing the construction of JITAIs. MRTs operate in, and take advantage of, the rapidly time-varying digital intervention environment. MRTs can be used to address research questions about whether and under what circumstances particular components of a JITAI are effective, with the ultimate objective of developing effective and efficient components. The purpose of this article is to clarify why, when, and how to use MRTs; to highlight elements that must be considered when designing and implementing an MRT; and to discuss the possibilities this emerging optimization trial design offers for future research in the behavioral sciences, education, and other fields. We briefly review key elements of JITAIs, and then describe three case studies of MRTs, each of which highlights research questions that can be addressed using the MRT and experimental design considerations that might arise. We also discuss a variety of considerations that go into planning and designing an MRT, using the case studies as examples.
HCMar 30, 2020
Translating Behavioral Theory into Technological Interventions: Case Study of an mHealth App to Increase Self-reporting of Substance-Use Related DataMashfiqui Rabbi, Meredith Philyaw-Kotov, Jinseok Li et al.
Mobile health (mHealth) applications are a powerful medium for providing behavioral interventions, and systematic reviews suggest that theory-based interventions are more effective. However, how exactly theoretical concepts should be translated into features of technological interventions is often not clear. There is a gulf between the abstract nature of psychological theory and the concreteness of the designs needed to build health technologies. In this paper, we use SARA, a mobile app we developed to support substance-use research among adolescents and young adults, as a case study of a process of translating behavioral theory into mHealth intervention design. SARA was designed to increase adherence to daily self-report in longitudinal epidemiological studies. To achieve this goal, we implemented a number of constructs from the operant conditioning theory. We describe our design process and discuss how we operationalized theoretical constructs in the light of design constraints, user feedback, and empirical data from four formative studies.
LGFeb 23, 2020
Rapidly Personalizing Mobile Health Treatment Policies with Limited DataSabina Tomkins, Peng Liao, Predrag Klasnja et al.
In mobile health (mHealth), reinforcement learning algorithms that adapt to one's context without learning personalized policies might fail to distinguish between the needs of individuals. Yet the high amount of noise due to the in situ delivery of mHealth interventions can cripple the ability of an algorithm to learn when given access to only a single user's data, making personalization challenging. We present IntelligentPooling, which learns personalized policies via an adaptive, principled use of other users' data. We show that IntelligentPooling achieves an average of 26% lower regret than state-of-the-art across all generative models. Additionally, we inspect the behavior of this approach in a live clinical trial, demonstrating its ability to learn from even a small group of users.
LGDec 30, 2019
Off-Policy Estimation of Long-Term Average Outcomes with Applications to Mobile HealthPeng Liao, Predrag Klasnja, Susan Murphy
Due to the recent advancements in wearables and sensing technology, health scientists are increasingly developing mobile health (mHealth) interventions. In mHealth interventions, mobile devices are used to deliver treatment to individuals as they go about their daily lives. These treatments are generally designed to impact a near time, proximal outcome such as stress or physical activity. The mHealth intervention policies, often called just-in-time adaptive interventions, are decision rules that map an individual's current state (e.g., individual's past behaviors as well as current observations of time, location, social activity, stress and urges to smoke) to a particular treatment at each of many time points. The vast majority of current mHealth interventions deploy expert-derived policies. In this paper, we provide an approach for conducting inference about the performance of one or more such policies using historical data collected under a possibly different policy. Our measure of performance is the average of proximal outcomes over a long time period should the particular mHealth policy be followed. We provide an estimator as well as confidence intervals. This work is motivated by HeartSteps, an mHealth physical activity intervention.
LGSep 8, 2019
Personalized HeartSteps: A Reinforcement Learning Algorithm for Optimizing Physical ActivityPeng Liao, Kristjan Greenewald, Predrag Klasnja et al.
With the recent evolution of mobile health technologies, health scientists are increasingly interested in developing just-in-time adaptive interventions (JITAIs), typically delivered via notification on mobile device and designed to help the user prevent negative health outcomes and promote the adoption and maintenance of healthy behaviors. A JITAI involves a sequence of decision rules (i.e., treatment policy) that takes the user's current context as input and specifies whether and what type of an intervention should be provided at the moment. In this paper, we develop a Reinforcement Learning (RL) algorithm that continuously learns and improves the treatment policy embedded in the JITAI as the data is being collected from the user. This work is motivated by our collaboration on designing the RL algorithm in HeartSteps V2 based on data from HeartSteps V1. HeartSteps is a physical activity mobile health application. The RL algorithm developed in this paper is being used in HeartSteps V2 to decide, five times per day, whether to deliver a context-tailored activity suggestion.
LGDec 2, 2018
Personalizing Intervention Probabilities By PoolingSabina Tomkins, Predrag Klasnja, Susan Murphy
In many mobile health interventions, treatments should only be delivered in a particular context, for example when a user is currently stressed, walking or sedentary. Even in an optimal context, concerns about user burden can restrict which treatments are sent. To diffuse the treatment delivery over times when a user is in a desired context, it is critical to predict the future number of times the context will occur. The focus of this paper is on whether personalization can improve predictions in these settings. Though the variance between individuals' behavioral patterns suggest that personalization should be useful, the amount of individual-level data limits its capabilities. Thus, we investigate several methods which pool data across users to overcome these deficiencies and find that pooling lowers the overall error rate relative to both personalized and batch approaches.