LGJan 31, 2025

A Variational Perspective on Generative Protein Fitness Optimization

arXiv:2501.19200v23 citationsh-index: 8ICML
Originality Highly original
AI Analysis

This addresses the problem of discovering enhanced protein variants for applications like drug design, representing a novel method for a known bottleneck in protein optimization.

The paper tackles the challenge of optimizing protein fitness by introducing VLGPO, a variational method that embeds sequences in a continuous latent space for efficient sampling and achieves state-of-the-art results on two protein benchmarks.

The goal of protein fitness optimization is to discover new protein variants with enhanced fitness for a given use. The vast search space and the sparsely populated fitness landscape, along with the discrete nature of protein sequences, pose significant challenges when trying to determine the gradient towards configurations with higher fitness. We introduce Variational Latent Generative Protein Optimization (VLGPO), a variational perspective on fitness optimization. Our method embeds protein sequences in a continuous latent space to enable efficient sampling from the fitness distribution and combines a (learned) flow matching prior over sequence mutations with a fitness predictor to guide optimization towards sequences with high fitness. VLGPO achieves state-of-the-art results on two different protein benchmarks of varying complexity. Moreover, the variational design with explicit prior and likelihood functions offers a flexible plug-and-play framework that can be easily customized to suit various protein design tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes