Partially Specified Causal Simulations
This addresses the need for more reliable validation of causal methods for researchers, though it is incremental as it builds on existing simulation practices.
The paper tackles the problem of improper simulation design in causal inference by introducing PARCS, a framework that generates data based on graphical causal models with adjustable parameters, and shows that using it improves findings in reproduced studies.
Simulation studies play a key role in the validation of causal inference methods. The simulation results are reliable only if the study is designed according to the promised operational conditions of the method-in-test. Still, many causal inference literature tend to design over-restricted or misspecified studies. In this paper, we elaborate on the problem of improper simulation design for causal methods and compile a list of desiderata for an effective simulation framework. We then introduce partially randomized causal simulation (PARCS), a simulation framework that meets those desiderata. PARCS synthesizes data based on graphical causal models and a wide range of adjustable parameters. There is a legible mapping from usual causal assumptions to the parameters, thus, users can identify and specify the subset of related parameters and randomize the remaining ones to generate a range of complying data-generating processes for their causal method. The result is a more comprehensive and inclusive empirical investigation for causal claims. Using PARCS, we reproduce and extend the simulation studies of two well-known causal discovery and missing data analysis papers to emphasize the necessity of a proper simulation design. Our results show that those papers would have improved and extended the findings, had they used PARCS for simulation. The framework is implemented as a Python package, too. By discussing the comprehensiveness and transparency of PARCS, we encourage causal inference researchers to utilize it as a standard tool for future works.