Deconfounded Score Method: Scoring DAGs with Dense Unobserved Confounding
This addresses a fundamental problem in causal inference for researchers and practitioners dealing with complex datasets where unobserved variables affect many observed ones, offering a novel solution with practical scalability.
The paper tackles the challenge of causal discovery in the presence of widespread unobserved confounding, which makes causal effects unidentifiable using traditional methods. It proposes an adjusted score-based algorithm that recovers sparse linear Gaussian DAGs approximately, scales to high dimensions, and shows robustness to deviations in model assumptions.
Unobserved confounding is one of the greatest challenges for causal discovery. The case in which unobserved variables have a widespread effect on many of the observed ones is particularly difficult because most pairs of variables are conditionally dependent given any other subset, rendering the causal effect unidentifiable. In this paper we show that beyond conditional independencies, under the principle of independent mechanisms, unobserved confounding in this setting leaves a statistical footprint in the observed data distribution that allows for disentangling spurious and causal effects. Using this insight, we demonstrate that a sparse linear Gaussian directed acyclic graph among observed variables may be recovered approximately and propose an adjusted score-based causal discovery algorithm that may be implemented with general purpose solvers and scales to high-dimensional problems. We find, in addition, that despite the conditions we pose to guarantee causal recovery, performance in practice is robust to large deviations in model assumptions.