CVLGIVMar 23, 2023

High Fidelity Image Synthesis With Deep VAEs In Latent Space

arXiv:2303.13714v113 citationsh-index: 5Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of efficient and realistic image synthesis for computer vision applications, representing an incremental improvement over existing methods.

The paper tackles high-resolution image generation by using a two-stage approach where an autoencoder compresses images into semantic features, then a hierarchical VAE models these features to avoid modeling fine-grained details. This method achieves a FID score of 9.34 on ImageNet-256, comparable to BigGAN.

We present fast, realistic image generation on high-resolution, multimodal datasets using hierarchical variational autoencoders (VAEs) trained on a deterministic autoencoder's latent space. In this two-stage setup, the autoencoder compresses the image into its semantic features, which are then modeled with a deep VAE. With this method, the VAE avoids modeling the fine-grained details that constitute the majority of the image's code length, allowing it to focus on learning its structural components. We demonstrate the effectiveness of our two-stage approach, achieving a FID of 9.34 on the ImageNet-256 dataset which is comparable to BigGAN. We make our implementation available online.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes