Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders
This work addresses the problem of efficient drug and protein design for researchers, offering a practical method that is incremental by building on existing Bayesian optimization and autoencoder techniques.
The paper tackled the challenge of applying Bayesian optimization to discrete, high-dimensional biological sequence design by developing LaMBO, which uses a denoising autoencoder and Gaussian process to enable gradient-based optimization in latent space, resulting in outperformance over genetic optimizers on small-molecule and fluorescent protein tasks.
Bayesian optimization (BayesOpt) is a gold standard for query-efficient continuous optimization. However, its adoption for drug design has been hindered by the discrete, high-dimensional nature of the decision variables. We develop a new approach (LaMBO) which jointly trains a denoising autoencoder with a discriminative multi-task Gaussian process head, allowing gradient-based optimization of multi-objective acquisition functions in the latent space of the autoencoder. These acquisition functions allow LaMBO to balance the explore-exploit tradeoff over multiple design rounds, and to balance objective tradeoffs by optimizing sequences at many different points on the Pareto frontier. We evaluate LaMBO on two small-molecule design tasks, and introduce new tasks optimizing \emph{in silico} and \emph{in vitro} properties of large-molecule fluorescent proteins. In our experiments LaMBO outperforms genetic optimizers and does not require a large pretraining corpus, demonstrating that BayesOpt is practical and effective for biological sequence design.