CVLGMay 11, 2019

Disentangling Content and Style via Unsupervised Geometry Distillation

arXiv:1905.04538v117 citations
Originality Highly original
AI Analysis

This addresses the challenge of unsupervised disentanglement for computer vision, enabling applications in image synthesis and editing without human annotation.

The paper tackles the problem of disentangling object representations into orthogonal content and style spaces without supervision, achieving photo-realistic image generation at 256x256 resolution with superior disentanglement and visual analogy quality on four datasets.

It is challenging to disentangle an object into two orthogonal spaces of content and style since each can influence the visual observation differently and unpredictably. It is rare for one to have access to a large number of data to help separate the influences. In this paper, we present a novel framework to learn this disentangled representation in a completely unsupervised manner. We address this problem in a two-branch Autoencoder framework. For the structural content branch, we project the latent factor into a soft structured point tensor and constrain it with losses derived from prior knowledge. This constraint encourages the branch to distill geometry information. Another branch learns the complementary style information. The two branches form an effective framework that can disentangle object's content-style representation without any human annotation. We evaluate our approach on four image datasets, on which we demonstrate the superior disentanglement and visual analogy quality both in synthesized and real-world data. We are able to generate photo-realistic images with 256*256 resolution that are clearly disentangled in content and style.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes