LGBMJun 21, 2023

Predicting protein variants with equivariant graph neural networks

Cambridge
arXiv:2306.12231v25 citationsh-index: 15Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses a research gap in protein engineering by comparing structural and sequence-based methods for variant prediction, though it is incremental as it builds on existing pre-trained models.

The paper compared structure-based equivariant graph neural networks (EGNNs) with sequence-based methods for predicting beneficial protein variants, finding that the structural approach achieved competitive performance while using significantly less training data.

Pre-trained models have been successful in many protein engineering tasks. Most notably, sequence-based models have achieved state-of-the-art performance on protein fitness prediction while structure-based models have been used experimentally to develop proteins with enhanced functions. However, there is a research gap in comparing structure- and sequence-based methods for predicting protein variants that are better than the wildtype protein. This paper aims to address this gap by conducting a comparative study between the abilities of equivariant graph neural networks (EGNNs) and sequence-based approaches to identify promising amino-acid mutations. The results show that our proposed structural approach achieves a competitive performance to sequence-based methods while being trained on significantly fewer molecules. Additionally, we find that combining assay labelled data with structure pre-trained models yields similar trends as with sequence pre-trained models. Our code and trained models can be found at: https://github.com/semiluna/partIII-amino-acid-prediction.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes