LGBMMay 7, 2025

Guide your favorite protein sequence generative model

arXiv:2505.04823v34 citationsh-index: 11
Originality Incremental advance
AI Analysis

This provides a general solution for protein engineers to incorporate experimental data into generative models, though it is incremental as it builds on existing models.

The authors tackled the problem of conditioning protein generative models on auxiliary information by developing ProteinGuide, a principled framework that unified various models and enabled conditioning on properties like stability and enzyme classes, resulting in the design of adenine base editor sequences with high activity.

Generative machine learning models on sequences are transforming protein engineering. However, no principled framework exists for conditioning these models on auxiliary information, such as experimental data, in a plug-and-play manner. Herein, we present ProteinGuide -- a principled and general method for conditioning -- by unifying a broad class of protein generative models under a single framework. We demonstrate the applicability of ProteinGuide by guiding two protein generative models, ProteinMPNN and ESM3, to generate amino acid and structure token sequences, conditioned on several user-specified properties such as enhanced stability, enzyme classes, and CATH-labeled folds. We also used ProteinGuide with inverse folding models and our own experimental assay to design adenine base editor sequences for high activity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes