Improving Diffusion Language Model Decoding through Joint Search in Generation Order and Token Space
This work addresses decoding inefficiencies in Diffusion Language Models for tasks like mathematical reasoning and coding, representing an incremental improvement over existing methods.
The paper tackled the problem of limited exploration in decoding trajectories for Diffusion Language Models by introducing Order-Token Search, which jointly searches over generation order and token values, resulting in absolute performance gains of 3.1% to 7.9% on benchmarks like GSM8K and HumanEval.
Diffusion Language Models (DLMs) offer order-agnostic generation that can explore many possible decoding trajectories. However, current decoding methods commit to a single trajectory, limiting exploration in trajectory space. We introduce Order-Token Search to explore this space through jointly searching over generation order and token values. Its core is a likelihood estimator that scores denoising actions, enabling stable pruning and efficient exploration of diverse trajectories. Across mathematical reasoning and coding benchmarks, Order-Token Search consistently outperforms baselines on GSM8K, MATH500, Countdown, and HumanEval (3.1%, 3.8%, 7.9%, and 6.8% absolute over backbone), matching or surpassing diffu-GRPO post-trained d1-LLaDA. Our work establishes joint search as a key component for advancing decoding in DLMs.