LGMay 24, 2025

Partition Generative Modeling: Masked Modeling Without Masks

arXiv:2505.18883v25 citationsh-index: 8
Originality Highly original
AI Analysis

This addresses a computational bottleneck in generative modeling for applications requiring fast, high-quality text and image generation, offering a significant speedup over existing methods.

The paper tackles the inefficiency of masked generative models (MGMs) during early sampling by introducing Partition Generative Models (PGM), which partition tokens and use sparse attention to block information flow, achieving at least 5× faster sampling on OpenWebText and 7.5× higher throughput on ImageNet with minimal quality loss.

Masked generative models (MGMs) are widely used to capture complex data and enable faster generation than autoregressive models (AR) through parallel decoding. However, MGMs typically operate on fixed-length inputs, which can be inefficient: early in sampling, most tokens are masked and carry no information, leading to wasted computation. In contrast, AR models process only tokens generated previously, making early iterations faster. In this work, we introduce the Partition Generative Model (PGM), a novel approach that combines the strengths of AR and MGMs. Rather than masking, PGM partitions tokens into two groups and employs sparse attention to block information flow between them. Since there is no information flow between partitions, the model can process the previously-generated tokens only during sampling, while retaining the ability to generate tokens in parallel and in any order. On OpenWebText, PGMs offer at least $5\times$ improvements in sampling latency and throughput, while producing samples with superior Generative Perplexity, compared to Masked Diffusion Language Models. On ImageNet, PGMs achieve a $7.5\times$ higher throughput than MaskGIT, with only a slight increase in FID (5.54 vs. 5.35). With twice as many sampling steps, the FID reduces to 4.56 while while being $3.9\times$ faster than MaskGIT. Finally, PGMs integrate seamlessly with MGM distillation, providing further inference speedups.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes