QMLGMar 11, 2023

A Systematic Study of Joint Representation Learning on Protein Sequences and Structures

arXiv:2303.06275v253 citationsh-index: 37Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of limited protein structure data for biologists by providing an incremental method that enhances function prediction through joint representation learning.

The study tackled the challenge of integrating protein sequence and structure information for representation learning by combining a state-of-the-art Protein Language Model (ESM-2) with structure encoders, achieving significant improvements and setting new state-of-the-art results for protein function annotation.

Learning effective protein representations is critical in a variety of tasks in biology such as predicting protein functions. Recent sequence representation learning methods based on Protein Language Models (PLMs) excel in sequence-based tasks, but their direct adaptation to tasks involving protein structures remains a challenge. In contrast, structure-based methods leverage 3D structural information with graph neural networks and geometric pre-training methods show potential in function prediction tasks, but still suffers from the limited number of available structures. To bridge this gap, our study undertakes a comprehensive exploration of joint protein representation learning by integrating a state-of-the-art PLM (ESM-2) with distinct structure encoders (GVP, GearNet, CDConv). We introduce three representation fusion strategies and explore different pre-training techniques. Our method achieves significant improvements over existing sequence- and structure-based methods, setting new state-of-the-art for function annotation. This study underscores several important design choices for fusing protein sequence and structure information. Our implementation is available at https://github.com/DeepGraphLearning/ESM-GearNet.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes