Information-Theoretic Causal Bounds under Unmeasured Confounding

arXiv:2601.17160v3h-index: 2Has Code

Originality Highly original

AI Analysis

This work addresses a fundamental challenge in causal inference for researchers and practitioners by enabling more flexible and data-driven estimation of causal effects, though it is incremental in building on existing partial identification methods.

The paper tackles the problem of estimating causal effects under unmeasured confounding by developing an information-theoretic framework that provides sharp partial identification without relying on restrictive assumptions, external inputs, or full structural models, and demonstrates tight and valid bounds in simulations and real-world applications.

We develop a data-driven information-theoretic framework for sharp partial identification of causal effects under unmeasured confounding. Existing approaches often rely on restrictive assumptions, such as bounded or discrete outcomes; require external inputs (for example, instrumental variables, proxies, or user-specified sensitivity parameters); necessitate full structural causal model specifications; or focus solely on population-level averages while neglecting covariate-conditional effects. We overcome all four limitations simultaneously by establishing novel information-theoretic, data-driven divergence bounds. Our key theoretical contribution shows that the f-divergence between the observational distribution P(Y | A = a, X = x) and the interventional distribution P(Y | do(A = a), X = x) is upper bounded by a function of the propensity score alone. This result enables sharp partial identification of conditional causal effects directly from observational data, without requiring external sensitivity parameters, auxiliary variables, full structural specifications, or outcome boundedness assumptions. For practical implementation, we develop a semiparametric estimator satisfying Neyman orthogonality (Chernozhukov et al., 2018), which ensures root-n consistent inference even when nuisance functions are estimated via flexible machine learning methods. Simulation studies and real-world data applications, implemented in the GitHub repository (https://github.com/yonghanjung/Information-Theretic-Bounds), demonstrate that our framework provides tight and valid causal bounds across a wide range of data-generating processes.

View on arXiv PDF Code

Similar