MELGJun 1, 2023

Calibrated and Conformal Propensity Scores for Causal Effect Estimation

Microsoft
arXiv:2306.00382v21 citationsh-index: 25
Originality Incremental advance
AI Analysis

This work addresses the problem of accurate causal inference for researchers and practitioners using observational data, offering an incremental improvement through calibration techniques.

The authors tackled the problem of biased treatment effect estimation from observational data by ensuring that learned propensity scores are calibrated, proving that calibration is necessary for unbiased estimation and showing that it improves error bounds and avoids extreme weights. They demonstrated improved causal effect estimation in tasks like high-dimensional image covariates and genome-wide association studies, where calibrated scores sped up GWAS analysis by more than two-fold.

Propensity scores are commonly used to estimate treatment effects from observational data. We argue that the probabilistic output of a learned propensity score model should be calibrated -- i.e., a predictive treatment probability of 90% should correspond to 90% of individuals being assigned the treatment group -- and we propose simple recalibration techniques to ensure this property. We prove that calibration is a necessary condition for unbiased treatment effect estimation when using popular inverse propensity weighted and doubly robust estimators. We derive error bounds on causal effect estimates that directly relate to the quality of uncertainties provided by the probabilistic propensity score model and show that calibration strictly improves this error bound while also avoiding extreme propensity weights. We demonstrate improved causal effect estimation with calibrated propensity scores in several tasks including high-dimensional image covariates and genome-wide association studies (GWASs). Calibrated propensity scores improve the speed of GWAS analysis by more than two-fold by enabling the use of simpler models that are faster to train.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes