CVJul 21, 2022

Auto-regressive Image Synthesis with Integrated Quantization

arXiv:2207.10776v116 citationsh-index: 68
Originality Incremental advance
AI Analysis

This work addresses the problem of limited diversity in conditional image generation for applications like creative design or data augmentation, though it appears incremental as it builds on existing auto-regressive and quantization techniques.

The paper tackles the challenge of generating diverse yet high-fidelity images in conditional image generation by introducing a framework that integrates CNN inductive bias with auto-regressive modeling, achieving superior performance in multiple tasks compared to state-of-the-art methods.

Deep generative models have achieved conspicuous progress in realistic image synthesis with multifarious conditional inputs, while generating diverse yet high-fidelity images remains a grand challenge in conditional image generation. This paper presents a versatile framework for conditional image generation which incorporates the inductive bias of CNNs and powerful sequence modeling of auto-regression that naturally leads to diverse image generation. Instead of independently quantizing the features of multiple domains as in prior research, we design an integrated quantization scheme with a variational regularizer that mingles the feature discretization in multiple domains, and markedly boosts the auto-regressive modeling performance. Notably, the variational regularizer enables to regularize feature distributions in incomparable latent spaces by penalizing the intra-domain variations of distributions. In addition, we design a Gumbel sampling strategy that allows to incorporate distribution uncertainty into the auto-regressive training procedure. The Gumbel sampling substantially mitigates the exposure bias that often incurs misalignment between the training and inference stages and severely impairs the inference performance. Extensive experiments over multiple conditional image generation tasks show that our method achieves superior diverse image generation performance qualitatively and quantitatively as compared with the state-of-the-art.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes