CVApr 12, 2018

A Variational U-Net for Conditional Appearance and Shape Generation

arXiv:1804.04694v1443 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of generating realistic images with controlled shape and appearance for applications in computer vision and graphics, though it appears incremental as it builds on existing U-Net and VAE frameworks.

The paper tackles the problem of generating images with spatial deformations by modeling shape and appearance separately, using a conditional U-Net and variational autoencoder. It shows significant improvements over state-of-the-art methods on datasets like COCO and DeepFashion.

Deep generative models have demonstrated great performance in image synthesis. However, results deteriorate in case of spatial deformations, since they generate images of objects directly, rather than modeling the intricate interplay of their inherent shape and appearance. We present a conditional U-Net for shape-guided image generation, conditioned on the output of a variational autoencoder for appearance. The approach is trained end-to-end on images, without requiring samples of the same object with varying pose or appearance. Experiments show that the model enables conditional image generation and transfer. Therefore, either shape or appearance can be retained from a query image, while freely altering the other. Moreover, appearance can be sampled due to its stochastic latent representation, while preserving shape. In quantitative and qualitative experiments on COCO, DeepFashion, shoes, Market-1501 and handbags, the approach demonstrates significant improvements over the state-of-the-art.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes