CVOct 17, 2024

Unlocking the Capabilities of Masked Generative Models for Image Synthesis via Self-Guidance

arXiv:2410.13136v15 citationsh-index: 5NIPS
Originality Incremental advance
AI Analysis

This work addresses a bottleneck in image synthesis for researchers and practitioners by enhancing MGM performance, though it is incremental as it adapts guidance methods from diffusion models.

The paper tackles the underperformance of masked generative models (MGMs) in image synthesis compared to continuous diffusion models by proposing a self-guidance sampling method that improves generation quality. It achieves a superior quality-diversity trade-off, outperforming existing MGM methods with more efficient training and sampling costs.

Masked generative models (MGMs) have shown impressive generative ability while providing an order of magnitude efficient sampling steps compared to continuous diffusion models. However, MGMs still underperform in image synthesis compared to recent well-developed continuous diffusion models with similar size in terms of quality and diversity of generated samples. A key factor in the performance of continuous diffusion models stems from the guidance methods, which enhance the sample quality at the expense of diversity. In this paper, we extend these guidance methods to generalized guidance formulation for MGMs and propose a self-guidance sampling method, which leads to better generation quality. The proposed approach leverages an auxiliary task for semantic smoothing in vector-quantized token space, analogous to the Gaussian blur in continuous pixel space. Equipped with the parameter-efficient fine-tuning method and high-temperature sampling, MGMs with the proposed self-guidance achieve a superior quality-diversity trade-off, outperforming existing sampling methods in MGMs with more efficient training and sampling costs. Extensive experiments with the various sampling hyperparameters confirm the effectiveness of the proposed self-guidance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes