ML LGOct 30, 2025

Assessment of the conditional exchangeability assumption in causal machine learning models: a simulation study

Gerard T. Portela, Jason B. Gibbons, Sebastian Schneeweiss, Rishi J. Desai

arXiv:2510.26700v1h-index: 50

Originality Synthesis-oriented

AI Analysis

This addresses the credibility of causal inference in observational data for researchers and practitioners, though it is incremental as it builds on existing diagnostic methods.

The study tackled the problem of unmeasured confounding violating the conditional exchangeability assumption in causal machine learning models for individualized treatment effects, finding that models like causal forest and X-learner failed to recover true heterogeneity and sometimes falsely indicated it, while negative control outcomes successfully identified subgroups affected by confounding.

Observational studies developing causal machine learning (ML) models for the prediction of individualized treatment effects (ITEs) seldom conduct empirical evaluations to assess the conditional exchangeability assumption. We aimed to evaluate the performance of these models under conditional exchangeability violations and the utility of negative control outcomes (NCOs) as a diagnostic. We conducted a simulation study to examine confounding bias in ITE estimates generated by causal forest and X-learner models under varying conditions, including the presence or absence of true heterogeneity. We simulated data to reflect real-world scenarios with differing levels of confounding, sample size, and NCO confounding structures. We then estimated and compared subgroup-level treatment effects on the primary outcome and NCOs across settings with and without unmeasured confounding. When conditional exchangeability was violated, causal forest and X-learner models failed to recover true treatment effect heterogeneity and, in some cases, falsely indicated heterogeneity when there was none. NCOs successfully identified subgroups affected by unmeasured confounding. Even when NCOs did not perfectly satisfy its ideal assumptions, it remained informative, flagging potential bias in subgroup level estimates, though not always pinpointing the subgroup with the largest confounding. Violations of conditional exchangeability substantially limit the validity of ITE estimates from causal ML models in routinely collected observational data. NCOs serve a useful empirical diagnostic tool for detecting subgroup-specific unmeasured confounding and should be incorporated into causal ML workflows to support the credibility of individualized inference.

View on arXiv PDF

Similar