What Exactly Does Guidance Do in Masked Discrete Diffusion Models
This provides theoretical insights into guidance mechanisms for researchers in generative modeling, though it is incremental as it builds on existing diffusion model frameworks.
The paper analyzes how classifier-free guidance influences sampling in masked discrete diffusion models, showing that it amplifies class-specific regions and suppresses shared ones, with a double-exponential decay rate in total variation for large guidance strengths.
We study masked discrete diffusion models with classifier-free guidance (CFG). Assuming no score error nor discretization error, we derive an explicit solution to the guided reverse dynamics, so that how guidance influences the sampling behavior can be precisely characterized. When the full data distribution is a mixture over classes and the goal is to sample from a specific class, guidance amplifies class-specific regions while suppresses regions shared with other classes. This effect depends on the guidance strength $w$ and induces distinct covariance structures in the sampled distribution. Notably, we observe quantitatively different behaviors in $1$D and $2$D. We also show that for large $w$, the decay rate of the total variation ($\mathrm{TV}$) along the reverse dynamics is double-exponential in $w$ for both $1$D and $2$D. These findings highlight the role of guidance, not just in shaping the output distribution, but also in controlling the dynamics of the sampling trajectory. Our theoretical analysis is supported by experiments that illustrate the geometric effects of guidance and its impact on convergence.