CVApr 12, 2018

A Variational U-Net for Conditional Appearance and Shape Generation

Patrick Esser, Ekaterina Sutter, Björn Ommer

arXiv:1804.04694v136.0443 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the challenge of generating realistic images with controlled shape and appearance for applications in computer vision and graphics, though it appears incremental as it builds on existing U-Net and VAE frameworks.

The paper tackles the problem of generating images with spatial deformations by modeling shape and appearance separately, using a conditional U-Net and variational autoencoder. It shows significant improvements over state-of-the-art methods on datasets like COCO and DeepFashion.

Deep generative models have demonstrated great performance in image synthesis. However, results deteriorate in case of spatial deformations, since they generate images of objects directly, rather than modeling the intricate interplay of their inherent shape and appearance. We present a conditional U-Net for shape-guided image generation, conditioned on the output of a variational autoencoder for appearance. The approach is trained end-to-end on images, without requiring samples of the same object with varying pose or appearance. Experiments show that the model enables conditional image generation and transfer. Therefore, either shape or appearance can be retained from a query image, while freely altering the other. Moreover, appearance can be sampled due to its stochastic latent representation, while preserving shape. In quantitative and qualitative experiments on COCO, DeepFashion, shoes, Market-1501 and handbags, the approach demonstrates significant improvements over the state-of-the-art.

View on arXiv PDF Code

Similar