QMAIApr 18

ProtoCycle: Reflective Tool-Augmented Planning for Text-Guided Protein Design

arXiv:2604.1689666.81 citationsh-index: 12
Predicted impact top 11% in QM · last 90 daysOriginality Incremental advance
AI Analysis

For protein engineers, this provides a data-efficient framework to translate natural language requirements into viable protein sequences.

ProtoCycle addresses the plan-execute gap in text-guided protein design by coupling an LLM planner with a tool environment and reflection, achieving strong language alignment and competitive foldability with limited supervision.

Designing proteins that satisfy natural language functional requirements is a central goal in protein engineering. A straightforward baseline is to fine-tune generic instruction-tuned LLMs as direct text-to-sequence generators, but this is data- and compute-hungry. With limited supervision, LLMs can produce coherent plans in text yet fail to reliably realize them as sequences. This plan-execute gap motivates ProtoCycle, an agentic framework for protein design that uses LLMs primarily to drive a multi-round, feedback-driven decision cycle. ProtoCycle couples an LLM planner with a lightweight tool environment designed to emulate the iterative workflow of human protein engineering and uses LLM-driven reflection on tool feedback to revise plans. Trained with supervised trajectories and online reinforcement learning, ProtoCycle achieves strong language alignment while maintaining competitive foldability, and ablations show that reflection substantially improves sequence quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes