CVAIDec 17, 2024

Unsupervised Region-Based Image Editing of Denoising Diffusion Models

arXiv:2412.12912v11 citationsh-index: 5AAAI
Originality Incremental advance
AI Analysis

This enables precise, unsupervised image editing for users of diffusion models, though it is incremental as it builds on existing latent space exploration methods.

The paper tackles the problem of identifying semantic attributes in the latent space of pre-trained diffusion models without external supervision, achieving state-of-the-art performance and even surpassing supervised approaches for some face attributes.

Although diffusion models have achieved remarkable success in the field of image generation, their latent space remains under-explored. Current methods for identifying semantics within latent space often rely on external supervision, such as textual information and segmentation masks. In this paper, we propose a method to identify semantic attributes in the latent space of pre-trained diffusion models without any further training. By projecting the Jacobian of the targeted semantic region into a low-dimensional subspace which is orthogonal to the non-masked regions, our approach facilitates precise semantic discovery and control over local masked areas, eliminating the need for annotations. We conducted extensive experiments across multiple datasets and various architectures of diffusion models, achieving state-of-the-art performance. In particular, for some specific face attributes, the performance of our proposed method even surpasses that of supervised approaches, demonstrating its superior ability in editing local image properties.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes