Learned Causal Method Prediction
This addresses a key bottleneck for researchers and practitioners in causal inference by automating method selection, though it is incremental as it builds on existing causal methods.
The paper tackles the problem of efficiently selecting the best causal inference method for a given dataset, which is challenging due to unverifiable assumptions and lack of ground truth, by proposing CAMP, a framework that predicts the optimal method through synthetic data generation and self-supervised pre-training, showing it outperforms individual methods and generalizes to unseen benchmarks.
For a given causal question, it is important to efficiently decide which causal inference method to use for a given dataset. This is challenging because causal methods typically rely on complex and difficult-to-verify assumptions, and cross-validation is not applicable since ground truth causal quantities are unobserved. In this work, we propose CAusal Method Predictor (CAMP), a framework for predicting the best method for a given dataset. To this end, we generate datasets from a diverse set of synthetic causal models, score the candidate methods, and train a model to directly predict the highest-scoring method for that dataset. Next, by formulating a self-supervised pre-training objective centered on dataset assumptions relevant for causal inference, we significantly reduce the need for costly labeled data and enhance training efficiency. Our strategy learns to map implicit dataset properties to the best method in a data-driven manner. In our experiments, we focus on method prediction for causal discovery. CAMP outperforms selecting any individual candidate method and demonstrates promising generalization to unseen semi-synthetic and real-world benchmarks.