LGCVMLMay 23, 2025

Variational Autoencoding Discrete Diffusion with Enhanced Dimensional Correlations Modeling

Peking U
arXiv:2505.17384v19 citationsh-index: 7
Originality Incremental advance
AI Analysis

This work addresses a bottleneck in discrete diffusion models for generating complex discrete data, offering incremental improvements in efficiency and quality for applications like image and text generation.

The paper tackles the problem of discrete diffusion models degrading in performance with few denoising steps due to limited modeling of inter-dimensional dependencies, and proposes VADD, a framework that enhances discrete diffusion with latent variable modeling to capture correlations, resulting in improved sample quality, especially with small step counts, as shown in empirical tests on 2D toy data, image generation, and text generation.

Discrete diffusion models have recently shown great promise for modeling complex discrete data, with masked diffusion models (MDMs) offering a compelling trade-off between quality and generation speed. MDMs denoise by progressively unmasking multiple dimensions from an all-masked input, but their performance can degrade when using few denoising steps due to limited modeling of inter-dimensional dependencies. In this paper, we propose Variational Autoencoding Discrete Diffusion (VADD), a novel framework that enhances discrete diffusion with latent variable modeling to implicitly capture correlations among dimensions. By introducing an auxiliary recognition model, VADD enables stable training via variational lower bounds maximization and amortized inference over the training set. Our approach retains the efficiency of traditional MDMs while significantly improving sample quality, especially when the number of denoising steps is small. Empirical results on 2D toy data, pixel-level image generation, and text generation demonstrate that VADD consistently outperforms MDM baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes