Improving Inverse Folding for Peptide Design with Diversity-regularized Direct Preference Optimization
This addresses the challenge of designing diverse and structurally accurate peptides for applications in protein engineering and drug discovery, representing an incremental improvement over existing models.
The paper tackled the problem of inverse folding models generating repetitive and structurally inconsistent peptide sequences by fine-tuning ProteinMPNN with diversity-regularized Direct Preference Optimization, achieving at least 8% improvement in structural similarity scores and up to 20% higher sequence diversity compared to standard methods.
Inverse folding models play an important role in structure-based design by predicting amino acid sequences that fold into desired reference structures. Models like ProteinMPNN, a message-passing encoder-decoder model, are trained to reliably produce new sequences from a reference structure. However, when applied to peptides, these models are prone to generating repetitive sequences that do not fold into the reference structure. To address this, we fine-tune ProteinMPNN to produce diverse and structurally consistent peptide sequences via Direct Preference Optimization (DPO). We derive two enhancements to DPO: online diversity regularization and domain-specific priors. Additionally, we develop a new understanding on improving diversity in decoder models. When conditioned on OpenFold generated structures, our fine-tuned models achieve state-of-the-art structural similarity scores, improving base ProteinMPNN by at least 8%. Compared to standard DPO, our regularized method achieves up to 20% higher sequence diversity with no loss in structural similarity score.