LGBMSep 5, 2025

Directed Evolution of Proteins via Bayesian Optimization in Embedding Space

arXiv:2509.04998v1h-index: 3BIBM
Originality Incremental advance
AI Analysis

This addresses the challenge of efficient protein engineering for researchers, though it appears incremental as it builds on existing machine-learning-assisted directed evolution techniques.

The paper tackles the problem of expensive and time-consuming protein design by introducing a method that combines Bayesian optimization with protein language model embeddings, resulting in improved performance and outperforming state-of-the-art methods with the same screening effort.

Directed evolution is an iterative laboratory process of designing proteins with improved function by iteratively synthesizing new protein variants and evaluating their desired property with expensive and time-consuming biochemical screening. Machine learning methods can help select informative or promising variants for screening to increase their quality and reduce the amount of necessary screening. In this paper, we present a novel method for machine-learning-assisted directed evolution of proteins which combines Bayesian optimization with informative representation of protein variants extracted from a pre-trained protein language model. We demonstrate that the new representation based on the sequence embeddings significantly improves the performance of Bayesian optimization yielding better results with the same number of conducted screening in total. At the same time, our method outperforms the state-of-the-art machine-learning-assisted directed evolution methods with regression objective.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes