The Diffusion Duality
This work addresses the problem of slow text generation for users of diffusion models, offering incremental improvements by adapting techniques from Gaussian diffusion.
The paper tackles the performance gap of uniform-state discrete diffusion models in text generation by leveraging their connection to Gaussian diffusion, achieving faster training and sampling. It surpasses autoregressive models in zero-shot perplexity on 3 of 7 benchmarks and accelerates sampling by two orders of magnitude.
Uniform-state discrete diffusion models hold the promise of fast text generation due to their inherent ability to self-correct. However, they are typically outperformed by autoregressive models and masked diffusion models. In this work, we narrow this performance gap by leveraging a key insight: Uniform-state diffusion processes naturally emerge from an underlying Gaussian diffusion. Our method, Duo, transfers powerful techniques from Gaussian diffusion to improve both training and sampling. First, we introduce a curriculum learning strategy guided by the Gaussian process, doubling training speed by reducing variance. Models trained with curriculum learning surpass autoregressive models in zero-shot perplexity on 3 of 7 benchmarks. Second, we present Discrete Consistency Distillation, which adapts consistency distillation from the continuous to the discrete setting. This algorithm unlocks few-step generation in diffusion language models by accelerating sampling by two orders of magnitude. We provide the code and model checkpoints on the project page: http://s-sahoo.github.io/duo