MapPFN: Learning Causal Perturbation Maps in Context
This work addresses the problem of limited adaptability in treatment-effect models for biologists, offering an incremental improvement by leveraging synthetic data to enhance prediction in new biological contexts.
The paper tackles the challenge of predicting perturbation effects in biological systems with limited real data by introducing MapPFN, a meta-learning approach that uses synthetic data and in-context learning to adapt to unseen contexts, achieving performance comparable to models trained on real single-cell data for identifying differentially expressed genes.
Planning effective interventions in biological systems requires treatment-effect models that adapt to unseen biological contexts by identifying their specific underlying mechanisms. Yet single-cell perturbation datasets span only a handful of biological contexts, and existing methods cannot leverage new interventional evidence at inference time to adapt beyond their training data. To meta-learn a perturbation effect estimator, we present MapPFN, a prior-data fitted network (PFN) pretrained on synthetic data generated from a prior over causal perturbations. Given a set of experiments, MapPFN uses in-context learning to predict post-perturbation distributions, without gradient-based optimization. Despite being pretrained on in silico gene knockouts alone, MapPFN identifies differentially expressed genes, matching the performance of models trained on real single-cell data. Our code and data are available at https://github.com/marvinsxtr/MapPFN.