CVMay 2, 2024

LocInv: Localization-aware Inversion for Text-Guided Image Editing

arXiv:2405.01496v111 citationsh-index: 23Has Code
Originality Incremental advance
AI Analysis

This addresses a specific issue in image editing for users of text-to-image models, offering incremental improvements over prior techniques.

The paper tackles the problem of text-guided image editing in diffusion models, where existing methods often edit unintended regions due to inaccurate cross-attention maps, and proposes LocInv, which uses localization priors like segmentation maps to refine these maps, achieving fine-grained editing with superior quantitative and qualitative results on a COCO dataset subset.

Large-scale Text-to-Image (T2I) diffusion models demonstrate significant generation capabilities based on textual prompts. Based on the T2I diffusion models, text-guided image editing research aims to empower users to manipulate generated images by altering the text prompts. However, existing image editing techniques are prone to editing over unintentional regions that are beyond the intended target area, primarily due to inaccuracies in cross-attention maps. To address this problem, we propose Localization-aware Inversion (LocInv), which exploits segmentation maps or bounding boxes as extra localization priors to refine the cross-attention maps in the denoising phases of the diffusion process. Through the dynamic updating of tokens corresponding to noun words in the textual input, we are compelling the cross-attention maps to closely align with the correct noun and adjective words in the text prompt. Based on this technique, we achieve fine-grained image editing over particular objects while preventing undesired changes to other regions. Our method LocInv, based on the publicly available Stable Diffusion, is extensively evaluated on a subset of the COCO dataset, and consistently obtains superior results both quantitatively and qualitatively.The code will be released at https://github.com/wangkai930418/DPL

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes