In-silico biological discovery with large perturbation models
This work addresses the problem of biological discovery for researchers by enabling better integration of heterogeneous data, though it appears incremental as it builds on existing deep-learning approaches.
The authors tackled the challenge of integrating diverse perturbation experiments in biology by developing the Large Perturbation Model (LPM), which outperforms existing methods in tasks like predicting transcriptomes and identifying molecular mechanisms.
Data generated in perturbation experiments link perturbations to the changes they elicit and therefore contain information relevant to numerous biological discovery tasks -- from understanding the relationships between biological entities to developing therapeutics. However, these data encompass diverse perturbations and readouts, and the complex dependence of experimental outcomes on their biological context makes it challenging to integrate insights across experiments. Here, we present the Large Perturbation Model (LPM), a deep-learning model that integrates multiple, heterogeneous perturbation experiments by representing perturbation, readout, and context as disentangled dimensions. LPM outperforms existing methods across multiple biological discovery tasks, including in predicting post-perturbation transcriptomes of unseen experiments, identifying shared molecular mechanisms of action between chemical and genetic perturbations, and facilitating the inference of gene-gene interaction networks.