Recover Experimental Data with Selection Bias using Counterfactual Logic
This work addresses selection bias in causal inference for researchers and practitioners, offering incremental improvements by extending existing methods to experimental data with partial observational information.
The paper tackled the problem of recovering unbiased experimental data affected by selection bias by analyzing how selection mechanisms propagate to counterfactual domains, deriving criteria to determine when experimental distributions remain unaffected, and proposing methods to recover unbiased data using partially unbiased observational data, with simulation studies demonstrating practical utility.
Selection bias, arising from the systematic inclusion or exclusion of certain samples, poses a significant challenge to the validity of causal inference. While Bareinboim et al. introduced methods for recovering unbiased observational and interventional distributions from biased data using partial external information, the complexity of the backdoor adjustment and the method's strong reliance on observational data limit its applicability in many practical settings. In this paper, we formally discover the recoverability of $P(Y^*_{x^*})$ under selection bias with experimental data. By explicitly constructing counterfactual worlds via Structural Causal Models (SCMs), we analyze how selection mechanisms in the observational world propagate to the counterfactual domain. We derive a complete set of graphical and theoretical criteria to determine that the experimental distribution remain unaffected by selection bias. Furthermore, we propose principled methods for leveraging partially unbiased observational data to recover $P(Y^*_{x^*})$ from biased experimental datasets. Simulation studies replicating realistic research scenarios demonstrate the practical utility of our approach, offering concrete guidance for mitigating selection bias in applied causal inference.