Approximating Counterfactual Bounds while Fusing Observational, Biased and Randomised Data Sources
This work addresses the challenge of fusing heterogeneous data sources for counterfactual inference in causal modeling, which is incremental as it extends existing methods to handle multiple biased datasets.
The paper tackles the problem of integrating multiple, possibly biased observational and interventional datasets to compute counterfactual bounds in structural causal models, showing that the likelihood has no local maxima and using a causal expectation-maximization scheme to approximate these bounds effectively, as demonstrated in numerical experiments and a palliative care case study.
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies, to eventually compute counterfactuals in structural causal models. We start from the case of a single observational dataset affected by a selection bias. We show that the likelihood of the available data has no local maxima. This enables us to use the causal expectation-maximisation scheme to approximate the bounds for partially identifiable counterfactual queries, which are the focus of this paper. We then show how the same approach can address the general case of multiple datasets, no matter whether interventional or observational, biased or unbiased, by remapping it into the former one via graphical transformations. Systematic numerical experiments and a case study on palliative care show the effectiveness of our approach, while hinting at the benefits of fusing heterogeneous data sources to get informative outcomes in case of partial identifiability.