Improving image synthesis with diffusion-negative sampling
This work addresses a practical problem for users of diffusion models in image synthesis by automating negative prompt generation, though it is incremental as it builds on existing prompting techniques.
The paper tackles the challenge of finding effective negative prompts for diffusion models in image generation by proposing a diffusion-negative prompting (DNP) strategy, which uses diffusion-negative sampling (DNS) to automatically generate negative prompts that improve prompt adherence and image quality, as validated through experiments and human evaluations.
For image generation with diffusion models (DMs), a negative prompt n can be used to complement the text prompt p, helping define properties not desired in the synthesized image. While this improves prompt adherence and image quality, finding good negative prompts is challenging. We argue that this is due to a semantic gap between humans and DMs, which makes good negative prompts for DMs appear unintuitive to humans. To bridge this gap, we propose a new diffusion-negative prompting (DNP) strategy. DNP is based on a new procedure to sample images that are least compliant with p under the distribution of the DM, denoted as diffusion-negative sampling (DNS). Given p, one such image is sampled, which is then translated into natural language by the user or a captioning model, to produce the negative prompt n*. The pair (p, n*) is finally used to prompt the DM. DNS is straightforward to implement and requires no training. Experiments and human evaluations show that DNP performs well both quantitatively and qualitatively and can be easily combined with several DM variants.