Simulations evaluating resampling methods for causal discovery: ensemble performance and calibration
This addresses the trust issue in causal discovery for scientific and medical fields, but it is incremental as it focuses on simulation-based evaluation of existing resampling techniques.
The paper tackled the problem of determining when causal discovery outputs can be trusted in real-world settings by evaluating resampling methods as confidence indicators. It found that subsampling and sampling with replacement performed well, with their calibration showing different convergence properties based on sample size.
Causal discovery can be a powerful tool for investigating causality when a system can be observed but is inaccessible to experiments in practice. Despite this, it is rarely used in any scientific or medical fields. One of the major hurdles preventing the field of causal discovery from having a larger impact is that it is difficult to determine when the output of a causal discovery method can be trusted in a real-world setting. Trust is especially critical when human health is on the line. In this paper, we report the results of a series of simulation studies investigating the performance of different resampling methods as indicators of confidence in discovered graph features. We found that subsampling and sampling with replacement both performed surprisingly well, suggesting that they can serve as grounds for confidence in graph features. We also found that the calibration of subsampling and sampling with replacement had different convergence properties, suggesting that one's choice of which to use should depend on the sample size.