LGOct 2, 2025

Large-Scale Bayesian Causal Discovery with Interventional Data

arXiv:2510.01562v11 citationsh-index: 10
Originality Incremental advance
AI Analysis

This addresses the need for scalable and uncertainty-aware causal discovery methods in fields like genomics, though it is incremental as it builds on existing interventional data approaches.

The authors tackled the problem of inferring causal relationships from interventional data, which is challenging at large scales and lacks uncertainty quantification, by proposing Interventional Bayesian Causal Discovery (IBCD), achieving superior structure recovery in simulations and applying it to CRISPR perturbation data on 521 genes.

Inferring the causal relationships among a set of variables in the form of a directed acyclic graph (DAG) is an important but notoriously challenging problem. Recently, advancements in high-throughput genomic perturbation screens have inspired development of methods that leverage interventional data to improve model identification. However, existing methods still suffer poor performance on large-scale tasks and fail to quantify uncertainty. Here, we propose Interventional Bayesian Causal Discovery (IBCD), an empirical Bayesian framework for causal discovery with interventional data. Our approach models the likelihood of the matrix of total causal effects, which can be approximated by a matrix normal distribution, rather than the full data matrix. We place a spike-and-slab horseshoe prior on the edges and separately learn data-driven weights for scale-free and Erdős-Rényi structures from observational data, treating each edge as a latent variable to enable uncertainty-aware inference. Through extensive simulation, we show that IBCD achieves superior structure recovery compared to existing baselines. We apply IBCD to CRISPR perturbation (Perturb-seq) data on 521 genes, demonstrating that edge posterior inclusion probabilities enable identification of robust graph structures.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes