CV LGMar 10, 2025

Effective and Efficient Masked Image Generation Models

Zebin You, Jingyang Ou, Xiaolu Zhang, Jun Hu, Jun Zhou, Chongxuan Li

arXiv:2503.07197v211 citationsh-index: 8Has CodeICML

Originality Incremental advance

AI Analysis

This work addresses efficiency and performance in image generation for AI researchers, but it is incremental as it builds on existing masked and diffusion models.

The paper tackles the problem of improving masked image generation models by unifying them with masked diffusion models, resulting in eMIGM which outperforms VAR on ImageNet 256x256 with similar resources and matches state-of-the-art continuous diffusion models using less than 40% of function evaluations.

Although masked image generation models and masked diffusion models are designed with different motivations and objectives, we observe that they can be unified within a single framework. Building upon this insight, we carefully explore the design space of training and sampling, identifying key factors that contribute to both performance and efficiency. Based on the improvements observed during this exploration, we develop our model, referred to as eMIGM. Empirically, eMIGM demonstrates strong performance on ImageNet generation, as measured by Fréchet Inception Distance (FID). In particular, on ImageNet 256x256, with similar number of function evaluations (NFEs) and model parameters, eMIGM outperforms the seminal VAR. Moreover, as NFE and model parameters increase, eMIGM achieves performance comparable to the state-of-the-art continuous diffusion models while requiring less than 40% of the NFE. Additionally, on ImageNet 512x512, with only about 60% of the NFE, eMIGM outperforms the state-of-the-art continuous diffusion models. Code is available at https://github.com/ML-GSAI/eMIGM.

View on arXiv PDF Code

Similar