Conditional Image Generation by Conditioning Variational Auto-Encoders
This work addresses the problem of efficient and high-quality conditional image generation for applications like inpainting, though it is incremental as it builds on existing VAE methods.
The paper tackles conditional image generation by developing a conditional variational auto-encoder that leverages a pretrained unconditional VAE to reduce training costs, achieving state-of-the-art performance in image inpainting by better representing uncertainty compared to GAN-based approaches.
We present a conditional variational auto-encoder (VAE) which, to avoid the substantial cost of training from scratch, uses an architecture and training objective capable of leveraging a foundation model in the form of a pretrained unconditional VAE. To train the conditional VAE, we only need to train an artifact to perform amortized inference over the unconditional VAE's latent variables given a conditioning input. We demonstrate our approach on tasks including image inpainting, for which it outperforms state-of-the-art GAN-based approaches at faithfully representing the inherent uncertainty. We conclude by describing a possible application of our inpainting model, in which it is used to perform Bayesian experimental design for the purpose of guiding a sensor.