CVAILGMar 19, 2025

Di$\mathtt{[M]}$O: Distilling Masked Diffusion Models into One-step Generator

arXiv:2503.15457v14 citationsh-index: 21
Originality Highly original
AI Analysis

This addresses the problem of inefficient inference in generative modeling for AI researchers and practitioners, representing an incremental improvement with novel distillation techniques.

The paper tackles the slow inference of masked diffusion models by proposing Di[M]O, a method that distills them into a one-step generator, achieving performance competitive with multi-step models while drastically reducing inference time.

Masked Diffusion Models (MDMs) have emerged as a powerful generative modeling technique. Despite their remarkable results, they typically suffer from slow inference with several steps. In this paper, we propose Di$\mathtt{[M]}$O, a novel approach that distills masked diffusion models into a one-step generator. Di$\mathtt{[M]}$O addresses two key challenges: (1) the intractability of using intermediate-step information for one-step generation, which we solve through token-level distribution matching that optimizes model output logits by an 'on-policy framework' with the help of an auxiliary model; and (2) the lack of entropy in the initial distribution, which we address through a token initialization strategy that injects randomness while maintaining similarity to teacher training distribution. We show Di$\mathtt{[M]}$O's effectiveness on both class-conditional and text-conditional image generation, impressively achieving performance competitive to multi-step teacher outputs while drastically reducing inference time. To our knowledge, we are the first to successfully achieve one-step distillation of masked diffusion models and the first to apply discrete distillation to text-to-image generation, opening new paths for efficient generative modeling.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes