CVLGMay 8, 2022

On Conditioning the Input Noise for Controlled Image Generation with Diffusion Models

arXiv:2205.03859v126 citationsh-index: 37
Originality Synthesis-oriented
AI Analysis

This work addresses the need for better control in image generation for applications like editing and 3-D object creation, but it appears incremental as it builds on existing diffusion model techniques.

The paper tackles the problem of limited control in diffusion models for conditional image generation by proposing to condition the input noise with crafted artifacts, enabling generation based on semantic attributes, with experiments demonstrating its potential across various examples and settings.

Conditional image generation has paved the way for several breakthroughs in image editing, generating stock photos and 3-D object generation. This continues to be a significant area of interest with the rise of new state-of-the-art methods that are based on diffusion models. However, diffusion models provide very little control over the generated image, which led to subsequent works exploring techniques like classifier guidance, that provides a way to trade off diversity with fidelity. In this work, we explore techniques to condition diffusion models with carefully crafted input noise artifacts. This allows generation of images conditioned on semantic attributes. This is different from existing approaches that input Gaussian noise and further introduce conditioning at the diffusion model's inference step. Our experiments over several examples and conditional settings show the potential of our approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes