CVApr 17, 2024

TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing

Sherry X. Chen, Yaron Vaxman, Elad Ben Baruch, David Asulin, Aviad Moreshet, Kuo-Chin Lien, Misha Sra, Pradeep Sen

arXiv:2404.11120v114.719 citationsh-index: 22Has CodeCVPR

Originality Incremental advance

AI Analysis

This addresses the problem of robust and efficient image editing for users of diffusion models, though it appears incremental as it builds on existing methods like Textual Inversion and DreamBooth.

The paper tackles the challenge of producing predictable and controllable image editing using pre-trained text-to-image models like Stable Diffusion, by introducing TiNO-Edit, which optimizes noise patterns and diffusion timesteps, resulting in better alignment with original images and desired edits while speeding up optimization with new loss functions in the latent domain.

Despite many attempts to leverage pre-trained text-to-image models (T2I) like Stable Diffusion (SD) for controllable image editing, producing good predictable results remains a challenge. Previous approaches have focused on either fine-tuning pre-trained T2I models on specific datasets to generate certain kinds of images (e.g., with a specific object or person), or on optimizing the weights, text prompts, and/or learning features for each input image in an attempt to coax the image generator to produce the desired result. However, these approaches all have shortcomings and fail to produce good results in a predictable and controllable manner. To address this problem, we present TiNO-Edit, an SD-based method that focuses on optimizing the noise patterns and diffusion timesteps during editing, something previously unexplored in the literature. With this simple change, we are able to generate results that both better align with the original images and reflect the desired result. Furthermore, we propose a set of new loss functions that operate in the latent domain of SD, greatly speeding up the optimization when compared to prior approaches, which operate in the pixel domain. Our method can be easily applied to variations of SD including Textual Inversion and DreamBooth that encode new concepts and incorporate them into the edited results. We present a host of image-editing capabilities enabled by our approach. Our code is publicly available at https://github.com/SherryXTChen/TiNO-Edit.

View on arXiv PDF Code

Similar