CVApr 6, 2024

InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization

arXiv:2404.04650v1123 citationsh-index: 6Has CodeCVPR
Originality Incremental advance
AI Analysis

This addresses a key challenge in text-to-image generation for users requiring precise prompt adherence, though it is an incremental improvement on existing diffusion models.

The paper tackles the problem of misalignment between generated images and text prompts in diffusion models by identifying invalid initial noise as the root cause and proposing Initial Noise Optimization (InitNO) to refine it, resulting in improved semantic faithfulness in image generation.

Recent strides in the development of diffusion models, exemplified by advancements such as Stable Diffusion, have underscored their remarkable prowess in generating visually compelling images. However, the imperative of achieving a seamless alignment between the generated image and the provided prompt persists as a formidable challenge. This paper traces the root of these difficulties to invalid initial noise, and proposes a solution in the form of Initial Noise Optimization (InitNO), a paradigm that refines this noise. Considering text prompts, not all random noises are effective in synthesizing semantically-faithful images. We design the cross-attention response score and the self-attention conflict score to evaluate the initial noise, bifurcating the initial latent space into valid and invalid sectors. A strategically crafted noise optimization pipeline is developed to guide the initial noise towards valid regions. Our method, validated through rigorous experimentation, shows a commendable proficiency in generating images in strict accordance with text prompts. Our code is available at https://github.com/xiefan-guo/initno.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes