LGMay 11

Many Needles in a Haystack: Active Hit Discovery for Perturbation Experiments

Andrea Rubbi, Arpit Merchant, Samuel Ogden, Amir Akbarnejad, Pietro Liò, Sattar Vakili, Mo Lotfollahi

arXiv:2605.1019618.1

Predicted impact top 71% in LG · last 90 daysOriginality Incremental advance

AI Analysis

For biologists conducting high-throughput gene perturbation experiments with limited budgets, this method improves hit discovery efficiency by directly optimizing for threshold exceedance rather than global optimization.

The paper formalizes hit discovery in high-throughput perturbation experiments as a sequential experimental design problem and proposes a Probability-of-Hit acquisition function that directly targets threshold exceedance, achieving up to 6.4% improvement over baselines on real biological datasets.

High-throughput gene perturbation experiments can test several genetic interventions in parallel, yet experimental budgets remain limited. A central goal is hit discovery: identifying as many perturbations as possible whose phenotypic effect exceeds a predefined threshold. Pure exploration strategies are statistically inefficient, wasting budget on low-value regions. Bayesian optimization methods offer a principled alternative but target a single global optimum, over-exploiting dominant modes while neglecting other high-value regions. We formalize hit discovery as a sequential experimental design problem and propose Probability-of-Hit, an acquisition function that directly targets threshold exceedance by ranking candidates according to their posterior probability of being a hit. We prove asymptotic optimality of this approach and demonstrate strong empirical performance on both synthetic benchmarks and real biological immunology datasets, including up to 6.4% improvement over baselines on the Schmidt IL-2 dataset.

View on arXiv PDF

Similar