LGJul 7, 2025

Discrete Diffusion Trajectory Alignment via Stepwise Decomposition

Jiaqi Han, Austin Wang, Minkai Xu, Wenda Chu, Meihua Dang, Yisong Yue, Stefano Ermon

arXiv:2507.04832v219.78 citationsh-index: 14

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficiently optimizing discrete diffusion models for sequence data, offering a method that is compatible with arbitrary rewards and shows gains in domains like DNA and protein design, though it is incremental as it builds on existing diffusion and alignment techniques.

The paper tackles the problem of aligning discrete diffusion models with reward functions by proposing an offline preference optimization method that decomposes alignment into stepwise objectives, achieving up to 12% improvement in DNA sequence design and boosting GSM8K scores from 78.6 to 81.2 in language modeling.

Discrete diffusion models have demonstrated great promise in modeling various sequence data, ranging from human language to biological sequences. Inspired by the success of RL in language models, there is growing interest in further improving the models by alignment with a certain reward. In this work, we propose an offline preference optimization method to approach trajectory alignment for discrete diffusion models. Instead of applying the reward on the final output and backpropagating the gradient to the entire denoising process, we decompose the problem into a set of stepwise alignment objectives by matching the per-step posterior. This framework enables efficient diffusion optimization, is compatible with arbitrary reward functions, and importantly, yields an equivalent optimal solution under additive factorization of the trajectory reward. Experiments across multiple domains including DNA sequence design, protein inverse folding, and language modeling consistently demonstrate the superiority of our approach. Notably, it achieves an up to 12\% improvement over the most competitive RL-based baseline in terms of predicted activity on DNA sequence design, and further improves the GSM8K score from 78.6 to 81.2 on LLaDA-8B-Instruct for language modeling.

View on arXiv PDF

Similar