Partial Counterfactual Identification from Observational and Experimental Data
This work addresses the challenge of partial counterfactual identification for researchers in causal inference, offering a method to derive bounds from mixed data sources, though it is incremental as it builds on existing SCM frameworks.
The paper tackles the problem of bounding counterfactual queries using observational and experimental data and causal diagrams, by translating it into polynomial programming and developing Monte Carlo algorithms to approximate optimal bounds, validated on synthetic and real-world datasets.
This paper investigates the problem of bounding counterfactual queries from an arbitrary collection of observational and experimental distributions and qualitative knowledge about the underlying data-generating model represented in the form of a causal diagram. We show that all counterfactual distributions in an arbitrary structural causal model (SCM) could be generated by a canonical family of SCMs with the same causal diagram where unobserved (exogenous) variables are discrete with a finite domain. Utilizing the canonical SCMs, we translate the problem of bounding counterfactuals into that of polynomial programming whose solution provides optimal bounds for the counterfactual query. Solving such polynomial programs is in general computationally expensive. We therefore develop effective Monte Carlo algorithms to approximate the optimal bounds from an arbitrary combination of observational and experimental data. Our algorithms are validated extensively on synthetic and real-world datasets.