Controllable Protein Sequence Generation with LLM Preference Optimization
This addresses biomedical challenges by enabling more effective protein design, though it appears incremental as it builds on existing protein LLMs.
The paper tackles the problem of generating protein sequences with specific attributes by proposing CtrlProt, a method that fine-tunes a protein LLM with multi-listwise preference optimization, achieving state-of-the-art performance in functionality and structural stability for both single- and multi-attribute generation.
Designing proteins with specific attributes offers an important solution to address biomedical challenges. Pre-trained protein large language models (LLMs) have shown promising results on protein sequence generation. However, to control sequence generation for specific attributes, existing work still exhibits poor functionality and structural stability. In this paper, we propose a novel controllable protein design method called CtrlProt. We finetune a protein LLM with a new multi-listwise preference optimization strategy to improve generation quality and support multi-attribute controllable generation. Experiments demonstrate that CtrlProt can meet functionality and structural stability requirements effectively, achieving state-of-the-art performance in both single-attribute and multi-attribute protein sequence generation.