LGAIJul 19, 2025

Language Models for Controllable DNA Sequence Design

arXiv:2507.19523v12 citationsh-index: 12Has CodeTrans. Mach. Learn. Res.
Originality Incremental advance
AI Analysis

This work addresses the problem of designing DNA sequences with specific biological functions for researchers in genomics and synthetic biology, representing an incremental advance by applying language models to an underexplored domain.

The paper tackles controllable DNA sequence design by introducing ATGC-Gen, a transformer-based model that generates sequences conditioned on biological properties, achieving notable improvements in controllability and functional relevance compared to prior methods.

We consider controllable DNA sequence design, where sequences are generated by conditioning on specific biological properties. While language models (LMs) such as GPT and BERT have achieved remarkable success in natural language generation, their application to DNA sequence generation remains largely underexplored. In this work, we introduce ATGC-Gen, an Automated Transformer Generator for Controllable Generation, which leverages cross-modal encoding to integrate diverse biological signals. ATGC-Gen is instantiated with both decoder-only and encoder-only transformer architectures, allowing flexible training and generation under either autoregressive or masked recovery objectives. We evaluate ATGC-Gen on representative tasks including promoter and enhancer sequence design, and further introduce a new dataset based on ChIP-Seq experiments for modeling protein binding specificity. Our experiments demonstrate that ATGC-Gen can generate fluent, diverse, and biologically relevant sequences aligned with the desired properties. Compared to prior methods, our model achieves notable improvements in controllability and functional relevance, highlighting the potential of language models in advancing programmable genomic design. The source code is released at (https://github.com/divelab/AIRS/blob/main/OpenBio/ATGC_Gen).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes