MLLGOct 4, 2025

Self-Speculative Masked Diffusions

arXiv:2510.03929v18 citationsh-index: 26
Originality Incremental advance
AI Analysis

This addresses the problem of high computational costs in generative modeling for researchers and practitioners, though it is incremental as it builds on existing masked diffusion methods.

The paper tackles the computational inefficiency of masked diffusion models for discrete data by introducing self-speculative masked diffusions, which generate non-factorized predictions to reduce function evaluations, achieving a ~2x reduction in network forward passes for tasks like GPT2-scale text modeling and protein sequence generation.

We present self-speculative masked diffusions, a new class of masked diffusion generative models for discrete data that require significantly fewer function evaluations to generate samples. Standard masked diffusion models predict factorized logits over currently masked positions. A number of masked positions are then sampled, however, the factorization approximation means that sampling too many positions in one go leads to poor sample quality. As a result, many simulation steps and therefore neural network function evaluations are required to generate high-quality data. We reduce the computational burden by generating non-factorized predictions over masked positions. This is achieved by modifying the final transformer attention mask from non-causal to causal, enabling draft token generation and parallel validation via a novel, model-integrated speculative sampling mechanism. This results in a non-factorized predictive distribution over masked positions in a single forward pass. We apply our method to GPT2 scale text modelling and protein sequences generation, finding that we can achieve a ~2x reduction in the required number of network forward passes relative to standard masked diffusion models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes