CVLGMay 19, 2017

Multi-Stage Variational Auto-Encoders for Coarse-to-Fine Image Generation

arXiv:1705.07202v181 citations
Originality Incremental advance
AI Analysis

This work addresses image quality issues in VAE-based generation for computer vision applications, but it is incremental as it builds on existing VAE methods with architectural modifications.

The paper tackled the problem of blurry image generation in variational auto-encoders (VAEs) by proposing a multi-stage coarse-to-fine framework, resulting in sharper images on MNIST and CelebA datasets compared to the original VAE.

Variational auto-encoder (VAE) is a powerful unsupervised learning framework for image generation. One drawback of VAE is that it generates blurry images due to its Gaussianity assumption and thus L2 loss. To allow the generation of high quality images by VAE, we increase the capacity of decoder network by employing residual blocks and skip connections, which also enable efficient optimization. To overcome the limitation of L2 loss, we propose to generate images in a multi-stage manner from coarse to fine. In the simplest case, the proposed multi-stage VAE divides the decoder into two components in which the second component generates refined images based on the course images generated by the first component. Since the second component is independent of the VAE model, it can employ other loss functions beyond the L2 loss and different model architectures. The proposed framework can be easily generalized to contain more than two components. Experiment results on the MNIST and CelebA datasets demonstrate that the proposed multi-stage VAE can generate sharper images as compared to those from the original VAE.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes