CVGRLGNov 29, 2021

Blended Diffusion for Text-driven Editing of Natural Images

arXiv:2111.14818v21243 citationsHas Code
AI Analysis

This work addresses the need for intuitive, language-based image editing tools for users, though it is incremental as it builds on existing diffusion and language-image models.

The paper tackles the problem of local text-driven editing of natural images by introducing a method that combines CLIP and DDPM to generate realistic edits based on text prompts and ROI masks, achieving superior performance in realism, background preservation, and text matching compared to baselines.

Natural language offers a highly intuitive interface for image editing. In this paper, we introduce the first solution for performing local (region-based) edits in generic natural images, based on a natural language description along with an ROI mask. We achieve our goal by leveraging and combining a pretrained language-image model (CLIP), to steer the edit towards a user-provided text prompt, with a denoising diffusion probabilistic model (DDPM) to generate natural-looking results. To seamlessly fuse the edited region with the unchanged parts of the image, we spatially blend noised versions of the input image with the local text-guided diffusion latent at a progression of noise levels. In addition, we show that adding augmentations to the diffusion process mitigates adversarial results. We compare against several baselines and related methods, both qualitatively and quantitatively, and show that our method outperforms these solutions in terms of overall realism, ability to preserve the background and matching the text. Finally, we show several text-driven editing applications, including adding a new object to an image, removing/replacing/altering existing objects, background replacement, and image extrapolation. Code is available at: https://omriavrahami.com/blended-diffusion-page/

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes