BMLGMay 31, 2023

Improving Protein-peptide Interface Predictions in the Low Data Regime

arXiv:2306.00557v1
Originality Incremental advance
AI Analysis

This addresses a domain-specific problem in computational biology for researchers needing accurate protein-peptide interface predictions in low-data scenarios, representing an incremental improvement.

The paper tackles the problem of predicting protein-peptide interactions with limited crystallized data by augmenting datasets with pseudo complexes from PDB, showing that this increases the predictive power of their bi-modal transformer architecture.

We propose a novel approach for predicting protein-peptide interactions using a bi-modal transformer architecture that learns an inter-facial joint distribution of residual contacts. The current data sets for crystallized protein-peptide complexes are limited, making it difficult to accurately predict interactions between proteins and peptides. To address this issue, we propose augmenting the existing data from PepBDB with pseudo protein-peptide complexes derived from the PDB. The augmented data set acts as a method to transfer physics-based contextdependent intra-residue (within a domain) interactions to the inter-residual (between) domains. We show that the distributions of inter-facial residue-residue interactions share overlap with inter residue-residue interactions, enough to increase predictive power of our bi-modal transformer architecture. In addition, this dataaugmentation allows us to leverage the vast amount of protein-only data available in the PDB to train neural networks, in contrast to template-based modeling that acts as a prior

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes