LGAICVJun 12, 2021

D2C: Diffusion-Denoising Models for Few-shot Conditional Generation

arXiv:2106.06819v1143 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of high-cost labeled data for conditional generative models in image applications, offering an incremental improvement over existing methods.

The paper tackles the problem of expensive supervision for conditional image generation by proposing D2C, a method for few-shot conditional generation using unconditional VAEs, which achieves superior performance on new labels and faster, human-preferred image manipulation compared to state-of-the-art models.

Conditional generative models of high-dimensional images have many applications, but supervision signals from conditions to images can be expensive to acquire. This paper describes Diffusion-Decoding models with Contrastive representations (D2C), a paradigm for training unconditional variational autoencoders (VAEs) for few-shot conditional image generation. D2C uses a learned diffusion-based prior over the latent representations to improve generation and contrastive self-supervised learning to improve representation quality. D2C can adapt to novel generation tasks conditioned on labels or manipulation constraints, by learning from as few as 100 labeled examples. On conditional generation from new labels, D2C achieves superior performance over state-of-the-art VAEs and diffusion models. On conditional image manipulation, D2C generations are two orders of magnitude faster to produce over StyleGAN2 ones and are preferred by 50% - 60% of the human evaluators in a double-blind study.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes