Sampling Data with Chains of Forward-Backward Diffusion Steps
For researchers in generative modeling and sampling, this work provides theoretical and empirical insights into the mixing behavior of diffusion-based samplers, though the findings are primarily diagnostic rather than offering a practical improvement.
The paper introduces U-turn chains, a Markov chain method using short forward-backward diffusion steps with Metropolis-Hastings correction, and reveals an ergodicity-breaking phase transition in minimal U-turn dynamics. On natural language and images, minimal U-turns show slow relaxation for high-level features, with layer-ordering inversion only at large noise, indicating constrained local dynamics.
Sampling from learned high-dimensional distributions is a foundational computational problem. We introduce U-turn chains: Markov chains obtained by iterating short forward-backward steps of a diffusion model, in which each step proposes a move that remains on the learned data manifold and, paired with a Metropolis-Hastings correction, samples from energy-modified targets. For synthetic languages, we show that minimal U-turn dynamics undergoes an ergodicity-breaking phase transition driven by fragmentation of the data manifold; ergodicity is restored at larger U-turn magnitude. In the non-ergodic regime, low-level features relax faster than high-level ones, an ordering that inverts only at sufficiently large U-turn magnitude. We test these predictions on natural language and natural images. In both modalities, minimal U-turns relax slowly, especially for high-level features approximated by deep representations in CNNs or LLMs. The layer-ordering inversion appears only at large noise when mixing is efficient -- signatures consistent with strongly constrained, weakly mixing local dynamics. We discuss the implications of these results for sampling with diffusion models.