LPDP: Inference-Time Reward Control for Variable-Length DNA Generation with Edit Flows

Jeongchan Kim, Yunkyung Ko, Jong Chul Ye

arXiv:2605.113680.11

AI Analysis55

This work addresses the need for variable-length, biologically plausible DNA sequence generation with reward control, which is a novel capability for synthetic biology and regulatory sequence design.

LPDP enables inference-time reward control for variable-length DNA generation using edit flows, achieving up to 0.8 reward improvement in enhancer optimization and 0.6 in exon-intron-exon inpainting tasks.

We study the application of recent Edit Flows for inference-time reward control for DNA sequence generation. Unlike most reward-guided DNA generation frameworks, which operate on fixed-length sequence spaces, Edit Flows have a potential to generate variable-length DNA through biologically plausible insertion, deletion, and substitution operations. In particular, we propose Local Perturbation Discrete Programming (LPDP), a training-free, intermediate-state and action-aware local re-solving operator for variable-length DNA edit-action generators at inference time. More specifically, at each guided rollout step, LPDP scores one-step root edits, retains a near-best root band, and re-ranks each retained root by solving a bounded local discrete program around its child sequence. This local program uses the typed geometry of edit actions to focus on coherent substitution, insertion, or deletion subgraphs, and aggregates local continuations with either a hard Max backup or a soft log-sum-exponential (LSE) backup. We instantiate LPDP in two regimes: front-loaded reward tilting for enhancer optimization, where early edits are critical for establishing global regulatory sequence structure, and back-loaded reward tilting for exon-intron-exon inpainting, where late edits fine-tune splice-boundary contexts.

View on arXiv PDF

Similar