Learning to refine domain knowledge for biological network inference
This work addresses the problem of biased or incomplete biological knowledge graphs for biologists, offering an incremental improvement by combining existing strategies for more accurate causal inference.
The paper tackles the challenge of inferring causal biological networks from sparse, high-dimensional perturbation data by proposing an amortized algorithm that refines domain knowledge using observed data, outperforming baselines in recovering ground truth causal graphs and identifying errors in prior knowledge with limited interventional data.
Perturbation experiments allow biologists to discover causal relationships between variables of interest, but the sparsity and high dimensionality of these data pose significant challenges for causal structure learning algorithms. Biological knowledge graphs can bootstrap the inference of causal structures in these situations, but since they compile vastly diverse information, they can bias predictions towards well-studied systems. Alternatively, amortized causal structure learning algorithms encode inductive biases through data simulation and train supervised models to recapitulate these synthetic graphs. However, realistically simulating biology is arguably even harder than understanding a specific system. In this work, we take inspiration from both strategies and propose an amortized algorithm for refining domain knowledge, based on data observations. On real and synthetic datasets, we show that our approach outperforms baselines in recovering ground truth causal graphs and identifying errors in the prior knowledge with limited interventional data.