LGAIGNMar 11

Continuous Diffusion Transformers for Designing Synthetic Regulatory Elements

arXiv:2603.10885v112.2h-index: 9
Predicted impact top 36% in LG · last 90 daysOriginality Highly original
AI Analysis

This work addresses the challenge of designing cell-type-specific regulatory elements for genomics applications, representing a novel method for a known bottleneck.

The authors tackled the problem of generating synthetic regulatory DNA sequences by developing a parameter-efficient Diffusion Transformer that replaces the U-Net backbone in DNA-Diffusion, achieving a 39% lower validation loss, 60× faster convergence, and reducing memorization from 5.3% to 1.7%.

We present a parameter-efficient Diffusion Transformer (DiT) for generating 200bp cell-type-specific regulatory DNA sequences. By replacing the U-Net backbone of DNA-Diffusion with a transformer denoiser equipped with a 2D CNN input encoder, our model matches the U-Net's best validation loss in 13 epochs (60$\times$ fewer) and converges 39% lower, while reducing memorization from 5.3% to 1.7% of generated sequences aligning to training data via BLAT. Ablations show the CNN encoder is essential: without it, validation loss increases 70% regardless of positional embedding choice. We further apply DDPO finetuning using Enformer as a reward model, achieving a 38$\times$ improvement in predicted regulatory activity. Cross-validation against DRAKES on an independent prediction task confirms that improvements reflect genuine regulatory signal rather than reward model overfitting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes