LGAIQMJan 16, 2025

Large Language Model is Secretly a Protein Sequence Optimizer

arXiv:2501.09274v21 citationsh-index: 7
AI Analysis

This addresses protein engineering for biotechnology, but it is incremental as it applies existing LLMs to a new domain.

The paper tackled protein sequence engineering by showing that large language models (LLMs) can optimize protein sequences using a directed evolutionary method, achieving success on synthetic and experimental fitness landscapes.

We consider the protein sequence engineering problem, which aims to find protein sequences with high fitness levels, starting from a given wild-type sequence. Directed evolution has been a dominating paradigm in this field which has an iterative process to generate variants and select via experimental feedback. We demonstrate large language models (LLMs), despite being trained on massive texts, are secretly protein sequence optimizers. With a directed evolutionary method, LLM can perform protein engineering through Pareto and experiment-budget constrained optimization, demonstrating success on both synthetic and experimental fitness landscapes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes