Cross-Modality Protein Embedding for Compound-Protein Affinity and Contact Prediction
This work addresses drug discovery by improving predictions for compound-protein interactions, though it appears incremental as it builds on existing multi-modal approaches.
The study tackled the problem of predicting compound-protein affinity and contact (CPAC) by treating proteins as multi-modal data with 1D sequences and 2D contact maps, and found that a cross-modality embedding model with cross interaction outperformed state-of-the-art methods and single-modality models in predictions for unseen proteins.
Compound-protein pairs dominate FDA-approved drug-target pairs and the prediction of compound-protein affinity and contact (CPAC) could help accelerate drug discovery. In this study we consider proteins as multi-modal data including 1D amino-acid sequences and (sequence-predicted) 2D residue-pair contact maps. We empirically evaluate the embeddings of the two single modalities in their accuracy and generalizability of CPAC prediction (i.e. structure-free interpretable compound-protein affinity prediction). And we rationalize their performances in both challenges of embedding individual modalities and learning generalizable embedding-label relationship. We further propose two models involving cross-modality protein embedding and establish that the one with cross interaction (thus capturing correlations among modalities) outperforms SOTAs and our single modality models in affinity, contact, and binding-site predictions for proteins never seen in the training set.