LGNov 16, 2023
Adaptive Interventions with User-Defined Goals for Health Behavior ChangeAishwarya Mandyam, Matthew Jörke, William Denton et al.
Promoting healthy lifestyle behaviors remains a major public health concern, particularly due to their crucial role in preventing chronic conditions such as cancer, heart disease, and type 2 diabetes. Mobile health applications present a promising avenue for low-cost, scalable health behavior change promotion. Researchers are increasingly exploring adaptive algorithms that personalize interventions to each person's unique context. However, in empirical studies, mobile health applications often suffer from small effect sizes and low adherence rates, particularly in comparison to human coaching. Tailoring advice to a person's unique goals, preferences, and life circumstances is a critical component of health coaching that has been underutilized in adaptive algorithms for mobile health interventions. To address this, we introduce a new Thompson sampling algorithm that can accommodate personalized reward functions (i.e., goals, preferences, and constraints), while also leveraging data sharing across individuals to more quickly be able to provide effective recommendations. We prove that our modification incurs only a constant penalty on cumulative regret while preserving the sample complexity benefits of data sharing. We present empirical results on synthetic and semi-synthetic physical activity simulators, where in the latter we conducted an online survey to solicit preference data relating to physical activity, which we use to construct realistic reward models that leverages historical data from another study. Our algorithm achieves substantial performance improvements compared to baselines that do not share data or do not optimize for individualized rewards.
MLJun 14, 2023
Bayesian Non-linear Latent Variable Modeling via Random Fourier FeaturesMichael Minyi Zhang, Gregory W. Gundersen, Barbara E. Engelhardt
The Gaussian process latent variable model (GPLVM) is a popular probabilistic method used for nonlinear dimension reduction, matrix factorization, and state-space modeling. Inference for GPLVMs is computationally tractable only when the data likelihood is Gaussian. Moreover, inference for GPLVMs has typically been restricted to obtaining maximum a posteriori point estimates, which can lead to overfitting, or variational approximations, which mischaracterize the posterior uncertainty. Here, we present a method to perform Markov chain Monte Carlo (MCMC) inference for generalized Bayesian nonlinear latent variable modeling. The crucial insight necessary to generalize GPLVMs to arbitrary observation models is that we approximate the kernel function in the Gaussian process mappings with random Fourier features; this allows us to compute the gradient of the posterior in closed form with respect to the latent variables. We show that we can generalize GPLVMs to non-Gaussian observations, such as Poisson, negative binomial, and multinomial distributions, using our random feature latent variable model (RFLVM). Our generalized RFLVMs perform on par with state-of-the-art latent variable models on a wide range of applications, including motion capture, images, and text data for the purpose of estimating the latent structure and imputing the missing data of these complex data sets.
MEOct 12, 2021Code
Nonnegative spatial factorizationF. William Townes, Barbara E. Engelhardt
Gaussian processes are widely used for the analysis of spatial data due to their nonparametric flexibility and ability to quantify uncertainty, and recently developed scalable approximations have facilitated application to massive datasets. For multivariate outcomes, linear models of coregionalization combine dimension reduction with spatial correlation. However, their real-valued latent factors and loadings are difficult to interpret because, unlike nonnegative models, they do not recover a parts-based representation. We present nonnegative spatial factorization (NSF), a spatially-aware probabilistic dimension reduction model that naturally encourages sparsity. We compare NSF to real-valued spatial factorizations such as MEFISTO and nonspatial dimension reduction methods using simulations and high-dimensional spatial transcriptomics data. NSF identifies generalizable spatial patterns of gene expression. Since not all patterns of gene expression are spatial, we also propose a hybrid extension of NSF that combines spatial and nonspatial components, enabling quantification of spatial importance for both observations and features. A TensorFlow implementation of NSF is available from https://github.com/willtownes/nsf-paper .
LGMar 13, 2023
Kernel Density Bayesian Inverse Reinforcement LearningAishwarya Mandyam, Didong Li, Jiayu Yao et al.
Inverse reinforcement learning (IRL) methods infer an agent's reward function using demonstrations of expert behavior. A Bayesian IRL approach models a distribution over candidate reward functions, capturing a degree of uncertainty in the inferred reward function. This is critical in some applications, such as those involving clinical data. Typically, Bayesian IRL algorithms require large demonstration datasets, which may not be available in practice. In this work, we incorporate existing domain-specific data to achieve better posterior concentration rates. We study a common setting in clinical and biological applications where we have access to expert demonstrations and known reward functions for a set of training tasks. Our aim is to learn the reward function of a new test task given limited expert demonstrations. Existing Bayesian IRL methods impose restrictions on the form of input data, thus limiting the incorporation of training task data. To better leverage information from training tasks, we introduce kernel density Bayesian inverse reinforcement learning (KD-BIRL). Our approach employs a conditional kernel density estimator, which uses the known reward functions of the training tasks to improve the likelihood estimation across a range of reward functions and demonstration samples. Our empirical results highlight KD-BIRL's faster concentration rate in comparison to baselines, particularly in low test task expert demonstration data regimes. Additionally, we are the first to provide theoretical guarantees of posterior concentration for a Bayesian IRL algorithm. Taken together, this work introduces a principled and theoretically grounded framework that enables Bayesian IRL to be applied across a variety of domains.
LGJul 26, 2025
PERRY: Policy Evaluation with Confidence Intervals using Auxiliary DataAishwarya Mandyam, Jason Meng, Ge Gao et al.
Off-policy evaluation (OPE) methods aim to estimate the value of a new reinforcement learning (RL) policy prior to deployment. Recent advances have shown that leveraging auxiliary datasets, such as those synthesized by generative models, can improve the accuracy of these value estimates. Unfortunately, such auxiliary datasets may also be biased, and existing methods for using data augmentation for OPE in RL lack principled uncertainty quantification. In high stakes settings like healthcare, reliable uncertainty estimates are important for comparing policy value estimates. In this work, we propose two approaches to construct valid confidence intervals for OPE when using data augmentation. The first provides a confidence interval over the policy performance conditioned on a particular initial state $V^π(s_0)$-- such intervals are particularly important for human-centered applications. To do so we introduce a new conformal prediction method for high dimensional state MDPs. Second, we consider the more common task of estimating the average policy performance over many initial states; to do so we draw on ideas from doubly robust estimation and prediction powered inference. Across simulators spanning robotics, healthcare and inventory management, and a real healthcare dataset from MIMIC-IV, we find that our methods can use augmented data and still consistently produce intervals that cover the ground truth values, unlike previously proposed methods.
AIMar 28, 2025
Sharpe Ratio-Guided Active Learning for Preference Optimization in RLHFSyrine Belakaria, Joshua Kazdan, Charles Marx et al.
Reinforcement learning from human feedback (RLHF) has become a cornerstone of the training and alignment pipeline for large language models (LLMs). Recent advances, such as direct preference optimization (DPO), have simplified the preference learning step. However, collecting preference data remains a challenging and costly process, often requiring expert annotation. This cost can be mitigated by carefully selecting the data points presented for annotation. In this work, we propose an active learning approach to efficiently select prompt and preference pairs using a risk assessment strategy based on the Sharpe Ratio. To address the challenge of unknown preferences prior to annotation, our method evaluates the gradients of all potential preference annotations to assess their impact on model updates. These gradient-based evaluations enable risk assessment of data points regardless of the annotation outcome. By leveraging the DPO loss derivations, we derive a closed-form expression for computing these Sharpe ratios on a per-tuple basis, ensuring our approach remains both tractable and computationally efficient. We also introduce two variants of our method, each making different assumptions about prior information. Experimental results demonstrate that our method outperforms the baseline by up to 5% in win rates against the chosen completion with limited human preference data across several language models and real-world datasets.
LGDec 11, 2024
CANDOR: Counterfactual ANnotated DOubly Robust Off-Policy EvaluationAishwarya Mandyam, Shengpu Tang, Jiayu Yao et al.
Off-policy evaluation (OPE) provides safety guarantees by estimating the performance of a policy before deployment. Recent work introduced IS+, an importance sampling (IS) estimator that uses expert-annotated counterfactual samples to improve behavior dataset coverage. However, IS estimators are known to have high variance; furthermore, the performance of IS+ deteriorates when annotations are imperfect. In this work, we propose a family of OPE estimators inspired by the doubly robust (DR) principle. A DR estimator combines IS with a reward model estimate, known as the direct method (DM), and offers favorable statistical guarantees. We propose three strategies for incorporating counterfactual annotations into a DR-inspired estimator and analyze their properties under various realistic settings. We prove that using imperfect annotations in the DM part of the estimator best leverages the annotations, as opposed to using them in the IS part. To support our theoretical findings, we evaluate the proposed estimators in three contextual bandit environments. Our empirical results show that when the reward model is misspecified and the annotations are imperfect, it is most beneficial to use the annotations only in the DM portion of a DR estimator. Based on these theoretical and empirical insights, we provide a practical guide for using counterfactual annotations in different realistic settings.
LGNov 21, 2025
APRIL: Annotations for Policy evaluation with Reliable Inference from LLMsAishwarya Mandyam, Kalyani Limaye, Barbara E. Engelhardt et al.
Off-policy evaluation (OPE) estimates the value of a contextual bandit policy prior to deployment. As such, OPE plays a critical role in ensuring safety in high-stakes domains such as healthcare. However, standard OPE approaches are limited by the size and coverage of the behavior dataset. While previous work has explored using expert-labeled counterfactual annotations to enhance dataset coverage, obtaining such annotations is expensive, limiting the scalability of prior approaches. We propose leveraging large language models (LLMs) to generate counterfactual annotations for OPE in medical domains. Our method uses domain knowledge to guide LLMs in predicting how key clinical features evolve under alternate treatments. These predicted features can then be transformed using known reward functions to create counterfactual annotations. We first evaluate the ability of several LLMs to predict clinical features across two patient subsets in MIMIC-IV, finding that state-of-the-art LLMs achieve comparable performance. Building on this capacity to predict clinical features, we generate LLM-based counterfactual annotations and incorporate them into an OPE estimator. Our empirical results analyze the benefits of counterfactual annotations under varying degrees of shift between the behavior and target policies. We find that in most cases, the LLM-based counterfactual annotations significantly improve OPE estimates up to a point. We provide an entropy-based metric to identify when additional annotations cease to be useful. Our results demonstrate that LLM-based counterfactual annotations offer a scalable approach for addressing coverage limitations in healthcare datasets, enabling safer deployment of decision-making policies in clinical settings.
LGOct 13, 2021
Variance Minimization in the Wasserstein Space for Invariant Causal PredictionGuillaume Martinet, Alexander Strzalkowski, Barbara E. Engelhardt
Selecting powerful predictors for an outcome is a cornerstone task for machine learning. However, some types of questions can only be answered by identifying the predictors that causally affect the outcome. A recent approach to this causal inference problem leverages the invariance property of a causal mechanism across differing experimental environments (Peters et al., 2016; Heinze-Deml et al., 2018). This method, invariant causal prediction (ICP), has a substantial computational defect -- the runtime scales exponentially with the number of possible causal variables. In this work, we show that the approach taken in ICP may be reformulated as a series of nonparametric tests that scales linearly in the number of predictors. Each of these tests relies on the minimization of a novel loss function -- the Wasserstein variance -- that is derived from tools in optimal transport theory and is used to quantify distributional variability across environments. We prove under mild assumptions that our method is able to recover the set of identifiable direct causes, and we demonstrate in our experiments that it is competitive with other benchmark causal discovery algorithms.
MLMar 26, 2021
Active multi-fidelity Bayesian online changepoint detectionGregory W. Gundersen, Diana Cai, Chuteng Zhou et al.
Online algorithms for detecting changepoints, or abrupt shifts in the behavior of a time series, are often deployed with limited resources, e.g., to edge computing settings such as mobile phones or industrial sensors. In these scenarios it may be beneficial to trade the cost of collecting an environmental measurement against the quality or "fidelity" of this measurement and how the measurement affects changepoint estimation. For instance, one might decide between inertial measurements or GPS to determine changepoints for motion. A Bayesian approach to changepoint detection is particularly appealing because we can represent our posterior uncertainty about changepoints and make active, cost-sensitive decisions about data fidelity to reduce this posterior uncertainty. Moreover, the total cost could be dramatically lowered through active fidelity switching, while remaining robust to changes in data distribution. We propose a multi-fidelity approach that makes cost-sensitive decisions about which data fidelity to collect based on maximizing information gain with respect to changepoints. We evaluate this framework on synthetic, video, and audio data and show that this information-based approach results in accurate predictions while reducing total cost.
MLJun 19, 2020
Latent variable modeling with random featuresGregory W. Gundersen, Michael Minyi Zhang, Barbara E. Engelhardt
Gaussian process-based latent variable models are flexible and theoretically grounded tools for nonlinear dimension reduction, but generalizing to non-Gaussian data likelihoods within this nonlinear framework is statistically challenging. Here, we use random features to develop a family of nonlinear dimension reduction models that are easily extensible to non-Gaussian data likelihoods; we call these random feature latent variable models (RFLVMs). By approximating a nonlinear relationship between the latent space and the observations with a function that is linear with respect to random features, we induce closed-form gradients of the posterior distribution with respect to the latent variable. This allows the RFLVM framework to support computationally tractable nonlinear latent variable models for a variety of data likelihoods in the exponential family without specialized derivations. Our generalized RFLVMs produce results comparable with other state-of-the-art dimension reduction methods on diverse types of data, including neural spike train recordings, images, and text data.
LGMar 17, 2020
Nonparametric Deconvolution ModelsAllison J. B. Chaney, Archit Verma, Young-suk Lee et al.
We describe nonparametric deconvolution models (NDMs), a family of Bayesian nonparametric models for collections of data in which each observation is the average over the features from heterogeneous particles. For example, these types of data are found in elections, where we observe precinct-level vote tallies (observations) of individual citizens' votes (particles) across each of the candidates or ballot measures (features), where each voter is part of a specific voter cohort or demographic (factor). Like the hierarchical Dirichlet process, NDMs rely on two tiers of Dirichlet processes to explain the data with an unknown number of latent factors; each observation is modeled as a weighted average of these latent factors. Unlike existing models, NDMs recover how factor distributions vary locally for each observation. This uniquely allows NDMs both to deconvolve each observation into its constituent factors, and also to describe how the factor distributions specific to each observation vary across observations and deviate from the corresponding global factors. We present variational inference techniques for this family of models and study its performance on simulated data and voting data from California. We show that including local factors improves estimates of global factors and provides a novel scaffold for exploring data.
MLJun 1, 2019
Patient-Specific Effects of Medication Using Latent Force Models with Gaussian ProcessesLi-Fang Cheng, Bianca Dumitrascu, Michael Zhang et al.
Multi-output Gaussian processes (GPs) are a flexible Bayesian nonparametric framework that has proven useful in jointly modeling the physiological states of patients in medical time series data. However, capturing the short-term effects of drugs and therapeutic interventions on patient physiological state remains challenging. We propose a novel approach that models the effect of interventions as a hybrid Gaussian process composed of a GP capturing patient physiology convolved with a latent force model capturing effects of treatments on specific physiological features. This convolution of a multi-output GP with a GP including a causal time-marked kernel leads to a well-characterized model of the patients' physiological state responding to interventions. We show that our model leads to analytically tractable cross-covariance functions, allowing scalable inference. Our hierarchical model includes estimates of patient-specific effects but allows sharing of support across patients. Our approach achieves competitive predictive performance on challenging hospital data, where we recover patient-specific response to the administration of three common drugs: one antihypertensive drug and two anticoagulants.
MLMay 24, 2019
Sequential Gaussian Processes for Online Learning of Nonstationary FunctionsMichael Minyi Zhang, Bianca Dumitrascu, Sinead A. Williamson et al.
Many machine learning problems can be framed in the context of estimating functions, and often these are time-dependent functions that are estimated in real-time as observations arrive. Gaussian processes (GPs) are an attractive choice for modeling real-valued nonlinear functions due to their flexibility and uncertainty quantification. However, the typical GP regression model suffers from several drawbacks: 1) Conventional GP inference scales $O(N^{3})$ with respect to the number of observations; 2) Updating a GP model sequentially is not trivial; and 3) Covariance kernels typically enforce stationarity constraints on the function, while GPs with non-stationary covariance kernels are often intractable to use in practice. To overcome these issues, we propose a sequential Monte Carlo algorithm to fit infinite mixtures of GPs that capture non-stationary behavior while allowing for online, distributed inference. Our approach empirically improves performance over state-of-the-art methods for online GP estimation in the presence of non-stationarity in time-series data. To demonstrate the utility of our proposed online Gaussian process mixture-of-experts approach in applied settings, we show that we can sucessfully implement an optimization algorithm using online Gaussian process bandits.
CYOct 30, 2017
How Algorithmic Confounding in Recommendation Systems Increases Homogeneity and Decreases UtilityAllison J. B. Chaney, Brandon M. Stewart, Barbara E. Engelhardt
Recommendation systems are ubiquitous and impact many domains; they have the potential to influence product consumption, individuals' perceptions of the world, and life-altering decisions. These systems are often evaluated or trained with data from users already exposed to algorithmic recommendations; this creates a pernicious feedback loop. Using simulations, we demonstrate how using data confounded in this way homogenizes user behavior without increasing utility.
LGJan 9, 2017
Coupled Compound Poisson FactorizationMehmet E. Basbug, Barbara E. Engelhardt
We present a general framework, the coupled compound Poisson factorization (CCPF), to capture the missing-data mechanism in extremely sparse data sets by coupling a hierarchical Poisson factorization with an arbitrary data-generating model. We derive a stochastic variational inference algorithm for the resulting model and, as examples of our framework, implement three different data-generating models---a mixture model, linear regression, and factor analysis---to robustly model non-random missing data in the context of clustering, prediction, and matrix factorization. In all three cases, we test our framework against models that ignore the missing-data mechanism on large scale studies with non-random missing data, and we show that explicitly modeling the missing-data mechanism substantially improves the quality of the results, as measured using data log likelihood on a held-out test set.
LGAug 17, 2016
Dynamic Collaborative Filtering with Compound Poisson FactorizationGhassen Jerfel, Mehmet E. Basbug, Barbara E. Engelhardt
Model-based collaborative filtering analyzes user-item interactions to infer latent factors that represent user preferences and item characteristics in order to predict future interactions. Most collaborative filtering algorithms assume that these latent factors are static, although it has been shown that user preferences and item perceptions drift over time. In this paper, we propose a conjugate and numerically stable dynamic matrix factorization (DCPF) based on compound Poisson matrix factorization that models the smoothly drifting latent factors using Gamma-Markov chains. We propose a numerically stable Gamma chain construction, and then present a stochastic variational inference approach to estimate the parameters of our model. We apply our model to time-stamped ratings data sets: Netflix, Yelp, and Last.fm, where DCPF achieves a higher predictive accuracy than state-of-the-art static and dynamic factorization models.
LGApr 13, 2016
Hierarchical Compound Poisson FactorizationMehmet E. Basbug, Barbara E. Engelhardt
Non-negative matrix factorization models based on a hierarchical Gamma-Poisson structure capture user and item behavior effectively in extremely sparse data sets, making them the ideal choice for collaborative filtering applications. Hierarchical Poisson factorization (HPF) in particular has proved successful for scalable recommendation systems with extreme sparsity. HPF, however, suffers from a tight coupling of sparsity model (absence of a rating) and response model (the value of the rating), which limits the expressiveness of the latter. Here, we introduce hierarchical compound Poisson factorization (HCPF) that has the favorable Gamma-Poisson structure and scalability of HPF to high-dimensional extremely sparse matrices. More importantly, HCPF decouples the sparsity model from the response model, allowing us to choose the most suitable distribution for the response. HCPF can capture binary, non-negative discrete, non-negative continuous, and zero-inflated continuous responses. We compare HCPF with HPF on nine discrete and three continuous data sets and conclude that HCPF captures the relationship between sparsity and response better than HPF.
STMar 17, 2016
Fast moment estimation for generalized latent Dirichlet modelsShiwen Zhao, Barbara E. Engelhardt, Sayan Mukherjee et al.
We develop a generalized method of moments (GMM) approach for fast parameter estimation in a new class of Dirichlet latent variable models with mixed data types. Parameter estimation via GMM has been demonstrated to have computational and statistical advantages over alternative methods, such as expectation maximization, variational inference, and Markov chain Monte Carlo. The key computational advan- tage of our method (MELD) is that parameter estimation does not require instantiation of the latent variables. Moreover, a representational advantage of the GMM approach is that the behavior of the model is agnostic to distributional assumptions of the observations. We derive population moment conditions after marginalizing out the sample-specific Dirichlet latent variables. The moment conditions only depend on component mean parameters. We illustrate the utility of our approach on simulated data, comparing results from MELD to alternative methods, and we show the promise of our approach through the application of MELD to several data sets.
LGFeb 16, 2016
Unsupervised Domain Adaptation Using Approximate Label MatchingJordan T. Ash, Robert E. Schapire, Barbara E. Engelhardt
Domain adaptation addresses the problem created when training data is generated by a so-called source distribution, but test data is generated by a significantly different target distribution. In this work, we present approximate label matching (ALM), a new unsupervised domain adaptation technique that creates and leverages a rough labeling on the test samples, then uses these noisy labels to learn a transformation that aligns the source and target samples. We show that the transformation estimated by ALM has favorable properties compared to transformations estimated by other methods, which do not use any kind of target labeling. Our model is regularized by requiring that a classifier trained to discriminate source from transformed target samples cannot distinguish between the two. We experiment with ALM on simulated and real data, and show that it outperforms techniques commonly used in the field.
MENov 7, 2014
Differential gene co-expression networks via Bayesian biclustering modelsChuan Gao, Shiwen Zhao, Ian C. McDowell et al.
Identifying latent structure in large data matrices is essential for exploring biological processes. Here, we consider recovering gene co-expression networks from gene expression data, where each network encodes relationships between genes that are locally co-regulated by shared biological mechanisms. To do this, we develop a Bayesian statistical model for biclustering to infer subsets of co-regulated genes whose covariation may be observed in only a subset of the samples. Our biclustering method, BicMix, has desirable properties, including allowing overcomplete representations of the data, computational tractability, and jointly modeling unknown confounders and biological signals. Compared with related biclustering methods, BicMix recovers latent structure with higher precision across diverse simulation scenarios. Further, we develop a method to recover gene co-expression networks from the estimated sparse biclustering matrices. We apply BicMix to breast cancer gene expression data and recover a gene co-expression network that is differential across ER+ and ER- samples.