AgentPLM: Agentic Protein Language Models with Reasoning-Augmented Decoding for Protein Sequence Design
For protein engineers and computational biologists, AgentPLM provides a method to generate protein sequences that satisfy thermodynamic and structural constraints without explicit backtracking, improving hit rates in antibody optimization and other design tasks.
AgentPLM equips protein language models with reasoning-augmented decoding and contrastive policy optimization to incorporate external biophysical feedback during generation, achieving state-of-the-art results with a 12% gain in antibody top-10% hit rate over passive baselines.
Protein language models (PLMs) are passive oracles: they generate sequences in a single forward pass with no mechanism to consult external biophysical feedback or redirect generation when a candidate violates thermodynamic or structural constraints. We introduce AgentPLM, which addresses this by equipping a pre-trained PLM with i) Reasoning-Augmented Decoding (RAD), which interleaves autoregressive generation with tool calls (ESMFold, FoldX, AutoDock Vina), and ii) Contrastive Agent Policy Optimisation (CAPO), a trajectory-level extension of direct preference optimisation that trains the policy end-to-end to learn when oracle feedback is informative rather than merely imitating high-fitness sequences. We evaluate AgentPLM on benchmark tasks spanning de novo enzyme design, antibody optimisation, thermostability, PPI interface design, and zero-shot fitness prediction with standardised oracle APIs and controlled sequence-identity splits. AgentPLM achieves state-of-the-art results with a gain in antibody top-10% hit rate over the strongest passive baseline, providing mechanistic evidence of online error correction without explicit backtracking.