Improved Sampling Schedules for Discrete Diffusion Models
This work addresses a key bottleneck in generative modeling for sequence data, offering improved efficiency and performance for researchers and practitioners in AI, though it is incremental as it builds on existing discrete diffusion frameworks.
The authors tackled the problem of inefficient sampling schedules in discrete diffusion models by analyzing reverse process dynamics through thermodynamic entropy production, resulting in two novel schedules (EDS and WDS) that significantly outperform state-of-the-art strategies across domains like music notation and vision, achieving superior performance with lower computational costs.
Discrete diffusion models have emerged as a powerful paradigm for generative modeling on sequence data; however, the information-theoretic principles governing their reverse processes remain significantly less understood than those of their continuous counterparts. In this work, we bridge this gap by analyzing the reverse process dynamics through the lens of thermodynamic entropy production. We propose the entropy production rate as a rigorous proxy for quantifying information generation, deriving as a byproduct a bound on the Wasserstein distance between intermediate states and the data distribution. Leveraging these insights, we introduce two novel sampling schedules that are uniformly spaced with respect to their corresponding physics-inspired metrics: the Entropic Discrete Schedule (EDS), which is defined by maintaining a constant rate of information gain, and the Wasserstein Discrete Schedule (WDS), which is defined by taking equal steps in terms of the Wasserstein distance. We empirically demonstrate that our proposed schedules significantly outperform state-of-the-art strategies across diverse application domains, including synthetic data, music notation, vision and language modeling, consistently achieving superior performance at a lower computational budget.