LGNEQMMLMar 23, 2022

Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders

arXiv:2203.12742v2140 citationsh-index: 61
Originality Incremental advance
AI Analysis

This work addresses the problem of efficient drug and protein design for researchers, offering a practical method that is incremental by building on existing Bayesian optimization and autoencoder techniques.

The paper tackled the challenge of applying Bayesian optimization to discrete, high-dimensional biological sequence design by developing LaMBO, which uses a denoising autoencoder and Gaussian process to enable gradient-based optimization in latent space, resulting in outperformance over genetic optimizers on small-molecule and fluorescent protein tasks.

Bayesian optimization (BayesOpt) is a gold standard for query-efficient continuous optimization. However, its adoption for drug design has been hindered by the discrete, high-dimensional nature of the decision variables. We develop a new approach (LaMBO) which jointly trains a denoising autoencoder with a discriminative multi-task Gaussian process head, allowing gradient-based optimization of multi-objective acquisition functions in the latent space of the autoencoder. These acquisition functions allow LaMBO to balance the explore-exploit tradeoff over multiple design rounds, and to balance objective tradeoffs by optimizing sequences at many different points on the Pareto frontier. We evaluate LaMBO on two small-molecule design tasks, and introduce new tasks optimizing \emph{in silico} and \emph{in vitro} properties of large-molecule fluorescent proteins. In our experiments LaMBO outperforms genetic optimizers and does not require a large pretraining corpus, demonstrating that BayesOpt is practical and effective for biological sequence design.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes