LGFeb 27

Learning Generation Orders for Masked Discrete Diffusion Models via Variational Inference

David Fox, Sam Bowyer, Song Liu, Laurence Aitchison, Raul Santos-Rodriguez, Mengyue Yang
arXiv:2602.23968v11 citations
AI Analysis

This work addresses the efficiency challenge in generative modeling for AI researchers, offering a novel learning-based approach to parallel generation, though it is incremental as it builds on existing masked diffusion models.

The paper tackles the problem of balancing parallel generation and sample quality in masked discrete diffusion models by proposing a variational inference framework to learn generation orders, achieving 33.1% accuracy with 4 steps on GSM8K, outperforming competitors at 23.7-29.0%.

Masked discrete diffusion models (MDMs) are a promising new approach to generative modelling, offering the ability for parallel token generation and therefore greater efficiency than autoregressive counterparts. However, achieving an optimal balance between parallel generation and sample quality remains an open problem. Current approaches primarily address this issue through fixed, heuristic parallel sampling methods. There exist some recent learning based approaches to this problem, but its formulation from the perspective of variational inference remains underexplored. In this work, we propose a variational inference framework for learning parallel generation orders for MDMs. As part of our method, we propose a parameterisation for the approximate posterior of generation orders which facilitates parallelism and efficient sampling during training. Using this method, we conduct preliminary experiments on the GSM8K dataset, where our method performs competitively against heuristic sampling strategies in the regime of highly parallel generation. For example, our method achieves 33.1\% accuracy with an average of only only 4 generation steps, compared to 23.7-29.0\% accuracy achieved by standard competitor methods in the same number of steps. We believe further experiments and analysis of the method will yield valuable insights into the problem of parallel generation with MDMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes