Recovering Latent Confounders from High-dimensional Proxy Variables
This work addresses a key bottleneck in causal inference for disciplines dealing with high-dimensional observed proxies, such as climate science, by enabling more accurate effect estimation, though it appears incremental by extending existing methods to broader conditions.
The paper tackles the problem of detecting latent confounders from high-dimensional proxy variables for causal effect estimation, removing previous limitations like low-dimensional proxies and binary treatments. It presents a Proxy Confounder Factorization (PCF) framework, achieving high correlation with latent confounders and low absolute error in synthetic datasets, and recovers components explaining 75.9% of variance in climate data.
Detecting latent confounders from proxy variables is an essential problem in causal effect estimation. Previous approaches are limited to low-dimensional proxies, sorted proxies, and binary treatments. We remove these assumptions and present a novel Proxy Confounder Factorization (PCF) framework for continuous treatment effect estimation when latent confounders manifest through high-dimensional, mixed proxy variables. For specific sample sizes, our two-step PCF implementation, using Independent Component Analysis (ICA-PCF), and the end-to-end implementation, using Gradient Descent (GD-PCF), achieve high correlation with the latent confounder and low absolute error in causal effect estimation with synthetic datasets in the high sample size regime. Even when faced with climate data, ICA-PCF recovers four components that explain $75.9\%$ of the variance in the North Atlantic Oscillation, a known confounder of precipitation patterns in Europe. Code for our PCF implementations and experiments can be found here: https://github.com/IPL-UV/confound_it. The proposed methodology constitutes a stepping stone towards discovering latent confounders and can be applied to many problems in disciplines dealing with high-dimensional observed proxies, e.g., spatiotemporal fields.