AICEQMJan 25, 2025

Controllable Protein Sequence Generation with LLM Preference Optimization

arXiv:2501.15007v19 citationsh-index: 4AAAI
Originality Highly original
AI Analysis

This addresses biomedical challenges by enabling more effective protein design, though it appears incremental as it builds on existing protein LLMs.

The paper tackles the problem of generating protein sequences with specific attributes by proposing CtrlProt, a method that fine-tunes a protein LLM with multi-listwise preference optimization, achieving state-of-the-art performance in functionality and structural stability for both single- and multi-attribute generation.

Designing proteins with specific attributes offers an important solution to address biomedical challenges. Pre-trained protein large language models (LLMs) have shown promising results on protein sequence generation. However, to control sequence generation for specific attributes, existing work still exhibits poor functionality and structural stability. In this paper, we propose a novel controllable protein design method called CtrlProt. We finetune a protein LLM with a new multi-listwise preference optimization strategy to improve generation quality and support multi-attribute controllable generation. Experiments demonstrate that CtrlProt can meet functionality and structural stability requirements effectively, achieving state-of-the-art performance in both single-attribute and multi-attribute protein sequence generation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes