A Knowledge-Informed Pretrained Model for Causal Discovery
This work addresses the practical deployment challenges in causal discovery for scenarios where only coarse domain knowledge is available, representing an incremental advancement.
The paper tackles the problem of causal discovery by proposing a knowledge-informed pretrained model that integrates weak prior knowledge as a middle ground between costly interventional signals and purely data-driven methods, achieving consistent improvements in experiments across in-distribution, out-of-distribution, and real-world datasets.
Causal discovery has been widely studied, yet many existing methods rely on strong assumptions or fall into two extremes: either depending on costly interventional signals or partial ground truth as strong priors, or adopting purely data driven paradigms with limited guidance, which hinders practical deployment. Motivated by real-world scenarios where only coarse domain knowledge is available, we propose a knowledge-informed pretrained model for causal discovery that integrates weak prior knowledge as a principled middle ground. Our model adopts a dual source encoder-decoder architecture to process observational data in a knowledge-informed way. We design a diverse pretraining dataset and a curriculum learning strategy that smoothly adapts the model to varying prior strengths across mechanisms, graph densities, and variable scales. Extensive experiments on in-distribution, out-of distribution, and real-world datasets demonstrate consistent improvements over existing baselines, with strong robustness and practical applicability.