LGCVMLJun 2, 2019

Generating Diverse High-Fidelity Images with VQ-VAE-2

arXiv:1906.00446v12340 citations
Originality Incremental advance
AI Analysis

This work addresses the need for fast, high-quality image generation in applications where speed is critical, offering an incremental improvement over existing VQ-VAE methods.

The authors tackled the problem of generating diverse, high-fidelity images by scaling and enhancing autoregressive priors in VQ-VAE models, achieving image quality that rivals state-of-the-art GANs on datasets like ImageNet while avoiding issues like mode collapse.

We explore the use of Vector Quantized Variational AutoEncoder (VQ-VAE) models for large scale image generation. To this end, we scale and enhance the autoregressive priors used in VQ-VAE to generate synthetic samples of much higher coherence and fidelity than possible before. We use simple feed-forward encoder and decoder networks, making our model an attractive candidate for applications where the encoding and/or decoding speed is critical. Additionally, VQ-VAE requires sampling an autoregressive model only in the compressed latent space, which is an order of magnitude faster than sampling in the pixel space, especially for large images. We demonstrate that a multi-scale hierarchical organization of VQ-VAE, augmented with powerful priors over the latent codes, is able to generate samples with quality that rivals that of state of the art Generative Adversarial Networks on multifaceted datasets such as ImageNet, while not suffering from GAN's known shortcomings such as mode collapse and lack of diversity.

Code Implementations15 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes