Effective Test-Time Scaling of Discrete Diffusion through Iterative Refinement
This work addresses a largely unexplored area for discrete diffusion models, offering a promising alternative for reward-guided generation, though it appears incremental as it builds on existing diffusion frameworks.
The paper tackled the problem of test-time scaling for discrete diffusion models by introducing Iterative Reward-Guided Refinement (IterRef), which uses reward-guided noising-denoising transitions to refine misaligned states, resulting in consistent improvements in generation quality and striking gains under low compute budgets.
Test-time scaling through reward-guided generation remains largely unexplored for discrete diffusion models despite its potential as a promising alternative. In this work, we introduce Iterative Reward-Guided Refinement (IterRef), a novel test-time scaling method tailored to discrete diffusion that leverages reward-guided noising-denoising transitions to progressively refine misaligned intermediate states. We formalize this process within a Multiple-Try Metropolis (MTM) framework, proving convergence to the reward-aligned distribution. Unlike prior methods that assume the current state is already aligned with the reward distribution and only guide the subsequent transition, our approach explicitly refines each state in situ, progressively steering it toward the optimal intermediate distribution. Across both text and image domains, we evaluate IterRef on diverse discrete diffusion models and observe consistent improvements in reward-guided generation quality. In particular, IterRef achieves striking gains under low compute budgets, far surpassing prior state-of-the-art baselines.