CVAIJun 1, 2022

DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder

arXiv:2206.00386v134 citationsh-index: 66
Originality Incremental advance
AI Analysis

This work addresses image synthesis for applications requiring high-quality outputs, but it is incremental as it combines existing VAE and diffusion methods.

The authors tackled the problem of generating photorealistic images by proposing DiVAE, a VQ-VAE architecture with a diffusion decoder, which achieves state-of-the-art results on ImageNet and produces more realistic images.

Recently most successful image synthesis models are multi stage process to combine the advantages of different methods, which always includes a VAE-like model for faithfully reconstructing embedding to image and a prior model to generate image embedding. At the same time, diffusion models have shown be capacity to generate high-quality synthetic images. Our work proposes a VQ-VAE architecture model with a diffusion decoder (DiVAE) to work as the reconstructing component in image synthesis. We explore how to input image embedding into diffusion model for excellent performance and find that simple modification on diffusion's UNet can achieve it. Training on ImageNet, Our model achieves state-of-the-art results and generates more photorealistic images specifically. In addition, we apply the DiVAE with an Auto-regressive generator on conditional synthesis tasks to perform more human-feeling and detailed samples.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes