LGJan 24, 2022

ReLSO: A Transformer-based Model for Latent Space Optimization and Generation of Proteins

arXiv:2201.09948v21 citations
AI Analysis

This work addresses the challenge of efficiently generating high-fitness protein sequences for applications in biotechnology and drug discovery, representing a novel method for fitness landscape traversal.

The authors tackled the problem of optimizing protein sequences for higher fitness by introducing ReLSO, a transformer-based model that jointly generates sequences and predicts fitness, achieving greater sequence optimization efficiency compared to other approaches on datasets like anti-ranibizumab and GFP.

The development of powerful natural language models have increased the ability to learn meaningful representations of protein sequences. In addition, advances in high-throughput mutagenesis, directed evolution, and next-generation sequencing have allowed for the accumulation of large amounts of labeled fitness data. Leveraging these two trends, we introduce Regularized Latent Space Optimization (ReLSO), a deep transformer-based autoencoder which features a highly structured latent space that is trained to jointly generate sequences as well as predict fitness. Through regularized prediction heads, ReLSO introduces a powerful protein sequence encoder and novel approach for efficient fitness landscape traversal. Using ReLSO, we explicitly model the sequence-function landscape of large labeled datasets and generate new molecules by optimizing within the latent space using gradient-based methods. We evaluate this approach on several publicly-available protein datasets, including variant sets of anti-ranibizumab and GFP. We observe a greater sequence optimization efficiency (increase in fitness per optimization step) by ReLSO compared to other approaches, where ReLSO more robustly generates high-fitness sequences. Furthermore, the attention-based relationships learned by the jointly-trained ReLSO models provides a potential avenue towards sequence-level fitness attribution information.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes