LGAIQMSep 15, 2024

Latent Diffusion Models for Controllable RNA Sequence Generation

arXiv:2409.09828v29 citationsh-index: 15
Originality Incremental advance
AI Analysis

This work addresses the challenge of designing functional RNA sequences for research and therapeutic applications, representing an incremental advancement by combining existing methods like BERT encoders and diffusion models in a novel way for RNA generation.

The paper tackled the problem of generating and optimizing RNA sequences with variable lengths by developing RNAdiffusion, a latent diffusion model that integrates reward gradients for functional properties, resulting in generated non-coding RNAs aligning with natural distributions and optimized mRNA 5'-UTRs outperforming baselines in translation efficiency metrics like MRL and TE.

This work presents RNAdiffusion, a latent diffusion model for generating and optimizing discrete RNA sequences of variable lengths. RNA is a key intermediary between DNA and protein, exhibiting high sequence diversity and complex three-dimensional structures to support a wide range of functions. We utilize pretrained BERT-type models to encode raw RNA sequences into token-level, biologically meaningful representations. A Query Transformer is employed to compress such representations into a set of fixed-length latent vectors, with an autoregressive decoder trained to reconstruct RNA sequences from these latent variables. We then develop a continuous diffusion model within this latent space. To enable optimization, we integrate the gradients of reward models--surrogates for RNA functional properties--into the backward diffusion process, thereby generating RNAs with high reward scores. Empirical results confirm that RNAdiffusion generates non-coding RNAs that align with natural distributions across various biological metrics. Further, we fine-tune the diffusion model on mRNA 5' untranslated regions (5'-UTRs) and optimize sequences for high translation efficiencies. Our guided diffusion model effectively generates diverse 5'-UTRs with high Mean Ribosome Loading (MRL) and Translation Efficiency (TE), outperforming baselines in balancing rewards and structural stability trade-off. Our findings hold potential for advancing RNA sequence-function research and therapeutic RNA design.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes