GeoPep: A geometry-aware masked language model for protein-peptide binding site prediction
This work addresses protein-peptide binding site prediction, a domain-specific problem in computational biology, with incremental advancements through transfer learning and architectural integration.
The paper tackled the challenge of predicting protein-peptide binding sites, which is hindered by peptide flexibility and limited structural data, and introduced GeoPep, a framework that fine-tunes a multimodal protein foundation model with distance-based loss, achieving significant performance improvements over existing methods.
Multimodal approaches that integrate protein structure and sequence have achieved remarkable success in protein-protein interface prediction. However, extending these methods to protein-peptide interactions remains challenging due to the inherent conformational flexibility of peptides and the limited availability of structural data that hinder direct training of structure-aware models. To address these limitations, we introduce GeoPep, a novel framework for peptide binding site prediction that leverages transfer learning from ESM3, a multimodal protein foundation model. GeoPep fine-tunes ESM3's rich pre-learned representations from protein-protein binding to address the limited availability of protein-peptide binding data. The fine-tuned model is further integrated with a parameter-efficient neural network architecture capable of learning complex patterns from sparse data. Furthermore, the model is trained using distance-based loss functions that exploit 3D structural information to enhance binding site prediction. Comprehensive evaluations demonstrate that GeoPep significantly outperforms existing methods in protein-peptide binding site prediction by effectively capturing sparse and heterogeneous binding patterns.