Diego Herrero-Quevedo

LGMar 7, 2025

Black Box Causal Inference: Effect Estimation via Meta Prediction

Lucius E. J. Bynum, Aahlad Manas Puli, Diego Herrero-Quevedo et al.

Causal inference and the estimation of causal effects plays a central role in decision-making across many areas, including healthcare and economics. Estimating causal effects typically requires an estimator that is tailored to each problem of interest. But developing estimators can take significant effort for even a single causal inference setting. For example, algorithms for regression-based estimators, propensity score methods, and doubly robust methods were designed across several decades to handle causal estimation with observed confounders. Similarly, several estimators have been developed to exploit instrumental variables (IVs), including two-stage least-squares (TSLS), control functions, and the method-of-moments. In this work, we instead frame causal inference as a dataset-level prediction problem, offloading algorithm design to the learning process. The approach we introduce, called black box causal inference (BBCI), builds estimators in a black-box manner by learning to predict causal effects from sampled dataset-effect pairs. We demonstrate accurate estimation of average treatment effects (ATEs) and conditional average treatment effects (CATEs) with BBCI across several causal inference problems with known identification, including problems with less developed estimators.

LGMar 19, 2025

A Vector-Quantized Foundation Model for Patient Behavior Monitoring

Rodrigo Oliver, Josué Pérez-Sabater, Leire Paz-Arbaizar et al.

Foundation models have achieved remarkable success across various domains, yet their adoption in healthcare remains limited. While significant advances have been made in medical imaging, genetic biomarkers, and time series from electronic health records, the potential of foundation models for patient behavior monitoring through personal digital devices remains underexplored. The data generated by these devices are inherently heterogeneous, multisource, and often exhibit high rates of missing data, posing unique challenges. This paper introduces a novel foundation model based on a modified vector quantized variational autoencoder, specifically designed to process real-world data from smartphones and wearable devices. We leveraged the discrete latent representation of this model to effectively perform two downstream tasks, suicide risk assessment and emotional state prediction, on different held-out clinical cohorts without the need of fine-tuning. We also highlight the existence of a trade-off between discrete and continuous latent structures, suggesting that hybrid models may be optimal for balancing accuracy across various supervised and unsupervised tasks.

Diego Herrero-Quevedo

2 Papers