CVJul 12, 2023

DiffuseGAE: Controllable and High-fidelity Image Manipulation from Disentangled Representation

arXiv:2307.05899v17 citationsh-index: 6
Originality Incremental advance
AI Analysis

This work addresses the problem of generic and controllable image editing for AI and computer vision researchers, offering an incremental improvement over prior methods.

The paper tackles the lack of a low-dimensional, interpretable latent code in diffusion probabilistic models for image manipulation by proposing DiffuseGAE, a module that improves disentanglement in diffusion autoencoders, enabling multi-attribute editing with high sample quality and reduced computational costs.

Diffusion probabilistic models (DPMs) have shown remarkable results on various image synthesis tasks such as text-to-image generation and image inpainting. However, compared to other generative methods like VAEs and GANs, DPMs lack a low-dimensional, interpretable, and well-decoupled latent code. Recently, diffusion autoencoders (Diff-AE) were proposed to explore the potential of DPMs for representation learning via autoencoding. Diff-AE provides an accessible latent space that exhibits remarkable interpretability, allowing us to manipulate image attributes based on latent codes from the space. However, previous works are not generic as they only operated on a few limited attributes. To further explore the latent space of Diff-AE and achieve a generic editing pipeline, we proposed a module called Group-supervised AutoEncoder(dubbed GAE) for Diff-AE to achieve better disentanglement on the latent code. Our proposed GAE has trained via an attribute-swap strategy to acquire the latent codes for multi-attribute image manipulation based on examples. We empirically demonstrate that our method enables multiple-attributes manipulation and achieves convincing sample quality and attribute alignments, while significantly reducing computational requirements compared to pixel-based approaches for representational decoupling. Code will be released soon.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes