CVFeb 8, 2024

Get What You Want, Not What You Don't: Image Content Suppression for Text-to-Image Diffusion Models

arXiv:2402.05375v160 citationsh-index: 23ICLR
Originality Incremental advance
AI Analysis

This addresses a practical limitation in text-to-image generation for users who need precise control over content, though it is an incremental improvement on existing methods.

The paper tackles the problem of text-to-image diffusion models generating unwanted content that users explicitly request to omit, introducing soft-weighted regularization and inference-time text embedding optimization to suppress undesired content while encouraging desired content, with validation on extensive experiments showing effectiveness across both pixel-space and latent-space diffusion models.

The success of recent text-to-image diffusion models is largely due to their capacity to be guided by a complex text prompt, which enables users to precisely describe the desired content. However, these models struggle to effectively suppress the generation of undesired content, which is explicitly requested to be omitted from the generated image in the prompt. In this paper, we analyze how to manipulate the text embeddings and remove unwanted content from them. We introduce two contributions, which we refer to as $\textit{soft-weighted regularization}$ and $\textit{inference-time text embedding optimization}$. The first regularizes the text embedding matrix and effectively suppresses the undesired content. The second method aims to further suppress the unwanted content generation of the prompt, and encourages the generation of desired content. We evaluate our method quantitatively and qualitatively on extensive experiments, validating its effectiveness. Furthermore, our method is generalizability to both the pixel-space diffusion models (i.e. DeepFloyd-IF) and the latent-space diffusion models (i.e. Stable Diffusion).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes