LGQMSep 25, 2024

CodonMPNN for Organism Specific and Codon Optimal Inverse Folding

arXiv:2409.17265v13 citationsh-index: 8Has Code
Originality Incremental advance
AI Analysis

This addresses the issue of inefficient protein expression in organisms like yeast for synthetic biology applications, representing an incremental improvement over existing inverse folding methods.

The paper tackles the problem of low expression yields in protein engineering due to suboptimal codon sequences by proposing CodonMPNN, which generates codon sequences conditioned on protein structures and organism labels, resulting in higher recovery of wild-type codons and a higher likelihood of generating high-fitness sequences compared to baselines.

Generating protein sequences conditioned on protein structures is an impactful technique for protein engineering. When synthesizing engineered proteins, they are commonly translated into DNA and expressed in an organism such as yeast. One difficulty in this process is that the expression rates can be low due to suboptimal codon sequences for expressing a protein in a host organism. We propose CodonMPNN, which generates a codon sequence conditioned on a protein backbone structure and an organism label. If naturally occurring DNA sequences are close to codon optimality, CodonMPNN could learn to generate codon sequences with higher expression yields than heuristic codon choices for generated amino acid sequences. Experiments show that CodonMPNN retains the performance of previous inverse folding approaches and recovers wild-type codons more frequently than baselines. Furthermore, CodonMPNN has a higher likelihood of generating high-fitness codon sequences than low-fitness codon sequences for the same protein sequence. Code is available at https://github.com/HannesStark/CodonMPNN.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes