LGCEDec 5, 2024

Leveraging Multi-modal Representations to Predict Protein Melting Temperatures

arXiv:2412.04526v3h-index: 1
Originality Incremental advance
AI Analysis

This work addresses protein stability prediction for protein engineering, but it is incremental as it builds on existing multi-modal representations and language models.

The study tackled predicting protein melting temperature changes (Delta Tm) by developing models based on protein language models like ESM-2, ESM-3, and AlphaFold, achieving a new state-of-the-art Pearson correlation coefficient of 0.50 on the s571 test dataset.

Accurately predicting protein melting temperature changes (Delta Tm) is fundamental for assessing protein stability and guiding protein engineering. Leveraging multi-modal protein representations has shown great promise in capturing the complex relationships among protein sequences, structures, and functions. In this study, we develop models based on powerful protein language models, including ESM-2, ESM-3 and AlphaFold, using various feature extraction methods to enhance prediction accuracy. By utilizing the ESM-3 model, we achieve a new state-of-the-art performance on the s571 test dataset, obtaining a Pearson correlation coefficient (PCC) of 0.50. Furthermore, we conduct a fair evaluation to compare the performance of different protein language models in the Delta Tm prediction task. Our results demonstrate that integrating multi-modal protein representations could advance the prediction of protein melting temperatures.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes