BMAIQMApr 3, 2022

Few Shot Protein Generation

arXiv:2204.01168v17 citationsh-index: 4
Originality Incremental advance
AI Analysis

This work addresses protein sequence generation for bioinformatics, offering improved performance in few-shot scenarios, though it is incremental as it builds on existing transformer and MSA-based approaches.

The researchers tackled the problem of generating protein sequences from protein families using multiple sequence alignments (MSAs), and their MSA-to-protein transformer outperformed conventional methods, especially with small MSAs, while accurately modeling epistasis and indels.

We present the MSA-to-protein transformer, a generative model of protein sequences conditioned on protein families represented by multiple sequence alignments (MSAs). Unlike existing approaches to learning generative models of protein families, the MSA-to-protein transformer conditions sequence generation directly on a learned encoding of the multiple sequence alignment, circumventing the need for fitting dedicated family models. By training on a large set of well-curated multiple sequence alignments in Pfam, our MSA-to-protein transformer generalizes well to protein families not observed during training and outperforms conventional family modeling approaches, especially when MSAs are small. Our generative approach accurately models epistasis and indels and allows for exact inference and efficient sampling unlike other approaches. We demonstrate the protein sequence modeling capabilities of our MSA-to-protein transformer and compare it with alternative sequence modeling approaches in comprehensive benchmark experiments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes