CVLGJun 29, 2021

An Image is Worth More Than a Thousand Words: Towards Disentanglement in the Wild

arXiv:2106.15610v241 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of disentangling image attributes with minimal manual effort, enabling more flexible image editing, though it is incremental as it builds on existing supervision-based approaches.

The paper tackles the problem of unsupervised disentanglement of factors in real-world images by proposing a method that works with partially labeled factors and separates residual factors, achieving state-of-the-art results in image manipulation on synthetic benchmarks and real domains like human faces using CLIP for zero-shot annotation.

Unsupervised disentanglement has been shown to be theoretically impossible without inductive biases on the models and the data. As an alternative approach, recent methods rely on limited supervision to disentangle the factors of variation and allow their identifiability. While annotating the true generative factors is only required for a limited number of observations, we argue that it is infeasible to enumerate all the factors of variation that describe a real-world image distribution. To this end, we propose a method for disentangling a set of factors which are only partially labeled, as well as separating the complementary set of residual factors that are never explicitly specified. Our success in this challenging setting, demonstrated on synthetic benchmarks, gives rise to leveraging off-the-shelf image descriptors to partially annotate a subset of attributes in real image domains (e.g. of human faces) with minimal manual effort. Specifically, we use a recent language-image embedding model (CLIP) to annotate a set of attributes of interest in a zero-shot manner and demonstrate state-of-the-art disentangled image manipulation results.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes