Valid Inference After Causal Discovery
This addresses a fundamental statistical challenge in causal inference for researchers and practitioners, offering a solution to ensure valid inference after causal discovery.
The paper tackles the problem of invalid statistical inference when estimating causal effects after using the same data for causal discovery, which leads to inflated miscoverage rates. It develops a method that provides reliable coverage and achieves more accurate causal discovery than data splitting.
Causal discovery and causal effect estimation are two fundamental tasks in causal inference. While many methods have been developed for each task individually, statistical challenges arise when applying these methods jointly: estimating causal effects after running causal discovery algorithms on the same data leads to "double dipping," invalidating the coverage guarantees of classical confidence intervals. To this end, we develop tools for valid post-causal-discovery inference. Across empirical studies, we show that a naive combination of causal discovery and subsequent inference algorithms leads to highly inflated miscoverage rates; on the other hand, applying our method provides reliable coverage while achieving more accurate causal discovery than data splitting.