CVMay 23, 2023

Not All Image Regions Matter: Masked Vector Quantization for Autoregressive Image Generation

arXiv:2305.13607v135 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the problem of high computational cost and limited modeling ability in autoregressive image generation for AI researchers, offering an incremental improvement over existing two-stage methods.

The paper tackles the redundancy in codebook learning for autoregressive image generation by proposing a masked vector quantization framework that distinguishes perceptual importance of image regions, resulting in improved generation quality and efficiency with faster training and inference speeds.

Existing autoregressive models follow the two-stage generation paradigm that first learns a codebook in the latent space for image reconstruction and then completes the image generation autoregressively based on the learned codebook. However, existing codebook learning simply models all local region information of images without distinguishing their different perceptual importance, which brings redundancy in the learned codebook that not only limits the next stage's autoregressive model's ability to model important structure but also results in high training cost and slow generation speed. In this study, we borrow the idea of importance perception from classical image coding theory and propose a novel two-stage framework, which consists of Masked Quantization VAE (MQ-VAE) and Stackformer, to relieve the model from modeling redundancy. Specifically, MQ-VAE incorporates an adaptive mask module for masking redundant region features before quantization and an adaptive de-mask module for recovering the original grid image feature map to faithfully reconstruct the original images after quantization. Then, Stackformer learns to predict the combination of the next code and its position in the feature map. Comprehensive experiments on various image generation validate our effectiveness and efficiency. Code will be released at https://github.com/CrossmodalGroup/MaskedVectorQuantization.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes