CVFeb 13, 2024

Latent Inversion with Timestep-aware Sampling for Training-free Non-rigid Editing

arXiv:2402.08601v31 citationsh-index: 4
AI Analysis

This work addresses a specific challenge in image editing for users of diffusion models, offering an incremental improvement over existing methods.

The paper tackles the problem of text-guided non-rigid image editing with Stable Diffusion, which often struggles to preserve object identity and background, by proposing a training-free method that improves identity preservation without compromising editability.

Text-guided non-rigid editing involves complex edits for input images, such as changing motion or compositions within their surroundings. Since it requires manipulating the input structure, existing methods often struggle with preserving object identity and background, particularly when combined with Stable Diffusion. In this work, we propose a training-free approach for non-rigid editing with Stable Diffusion, aimed at improving the identity preservation quality without compromising editability. Our approach comprises three stages: text optimization, latent inversion, and timestep-aware text injection sampling. Inspired by the success of Imagic, we employ their text optimization for smooth editing. Then, we introduce latent inversion to preserve the input image's identity without additional model fine-tuning. To fully utilize the input reconstruction ability of latent inversion, we suggest timestep-aware text injection sampling. This effectively retains the structure of the input image by injecting the source text prompt in early sampling steps and then transitioning to the target prompt in subsequent sampling steps. This strategic approach seamlessly harmonizes with text optimization, facilitating complex non-rigid edits to the input without losing the original identity. We demonstrate the effectiveness of our method in terms of identity preservation, editability, and aesthetic quality through extensive experiments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes