Fast Flow Matching based Conditional Independence Tests for Causal Discovery
This work addresses a computational bottleneck for researchers and practitioners using constraint-based causal discovery methods, offering incremental improvements in speed and efficiency.
The paper tackles the high computational cost of conditional independence tests in constraint-based causal discovery by proposing the Flow Matching-based Conditional Independence Test (FMCIT), which accelerates tests by requiring only a single model training and integrates into a two-stage framework (GPC-FMCIT) that bounds query numbers while maintaining high power, achieving favorable accuracy-efficiency trade-offs in experiments.
Constraint-based causal discovery methods require a large number of conditional independence (CI) tests, which severely limits their practical applicability due to high computational complexity. Therefore, it is crucial to design an algorithm that accelerates each individual test. To this end, we propose the Flow Matching-based Conditional Independence Test (FMCIT). The proposed test leverages the high computational efficiency of flow matching and requires the model to be trained only once throughout the entire causal discovery procedure, substantially accelerating causal discovery. According to numerical experiments, FMCIT effectively controls type-I error and maintains high testing power under the alternative hypothesis, even in the presence of high-dimensional conditioning sets. In addition, we further integrate FMCIT into a two-stage guided PC skeleton learning framework, termed GPC-FMCIT, which combines fast screening with guided, budgeted refinement using FMCIT. This design yields explicit bounds on the number of CI queries while maintaining high statistical power. Experiments on synthetic and real-world causal discovery tasks demonstrate favorable accuracy-efficiency trade-offs over existing CI testing methods and PC variants.