ML LGNov 10, 2022

Probabilistic thermal stability prediction through sparsity promoting transformer representation

Yevgen Zainchkovskyy, Jesper Ferkinghoff-Borg, Anja Bennett, Thomas Egebjerg, Nikolai Lorenzen, Per Jr. Greisen, Søren Hauberg, Carsten Stahlhut

arXiv:2211.05698v12.1h-index: 24

Originality Incremental advance

AI Analysis

This work addresses the need for more accurate and robust thermal stability predictions in ML-driven drug design, though it appears incremental by building on existing pre-trained protein language models.

The paper tackled the problem of predicting protein thermal stability by introducing a sparsity-promoting transformer representation and a probabilistic framework, achieving a mean absolute error of 0.23°C for melting temperature prediction of single-chain variable fragments.

Pre-trained protein language models have demonstrated significant applicability in different protein engineering task. A general usage of these pre-trained transformer models latent representation is to use a mean pool across residue positions to reduce the feature dimensions to further downstream tasks such as predicting bio-physics properties or other functional behaviours. In this paper we provide a two-fold contribution to machine learning (ML) driven drug design. Firstly, we demonstrate the power of sparsity by promoting penalization of pre-trained transformer models to secure more robust and accurate melting temperature (Tm) prediction of single-chain variable fragments with a mean absolute error of 0.23C. Secondly, we demonstrate the power of framing our prediction problem in a probabilistic framework. Specifically, we advocate for the need of adopting probabilistic frameworks especially in the context of ML driven drug design.

View on arXiv PDF

Similar