LGAIETMay 8

Physical Simulators as Do-Operators: Causal Discovery under Latent Confounders for AI-for-Science

arXiv:2605.0746728.4
AI Analysis

For AI-for-Science applications like molecular design and materials science, CFM-SD addresses the critical problem of causal discovery with latent confounders and expensive real interventions.

CFM-SD uses physical simulators as do-operators to perform causal discovery under latent confounders, achieving F1=0.800 vs. 0.127–0.562 for baselines on synthetic data and 57–58% bias reduction in real scientific tasks.

Existing interventional causal discovery methods -- IGSP, DCDI, ENCO -- assume causal sufficiency (no latent confounders) and rely on virtual interventions in synthetic simulators. In AI-for-Science settings such as molecular design and materials science, latent confounders are ubiquitous and real interventions (e.g., physics-based simulations) require hours to days per data point. We propose CFM-SD (Causal Flow Matching with Simulation Data), which uses first-principles physical simulators as do-operators in Pearl's interventional calculus to simultaneously handle latent confounders and real interventional data. Theoretically, $d$-variable causal structure is identifiable with $O(d)$ single-variable interventions -- the minimum under physical realizability constraints. In Intrinsic Evaluation on synthetic data ($γ=0.2$--$0.8$), CFM-SD achieves average F1$=0.800$ vs. F1$=0.127$--$0.562$ for all baselines. In Extrinsic Evaluation on real scientific data, CFM-SD achieves 57--58\% bias reduction in molecular toxicity prediction and battery electrolyte optimization, demonstrating practical value beyond synthetic benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes