SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing
This work addresses a specific bottleneck in inversion-based editing for diffusion models, offering an incremental improvement for researchers and practitioners in image generation.
The paper tackles the error accumulation problem in DDIM inversion for text-to-image editing by proposing a method to disentangle guidance scales for source and target branches, improving performance on PIE-Bench without efficiency loss.
Diffusion models demonstrate impressive image generation performance with text guidance. Inspired by the learning process of diffusion, existing images can be edited according to text by DDIM inversion. However, the vanilla DDIM inversion is not optimized for classifier-free guidance and the accumulated error will result in the undesired performance. While many algorithms are developed to improve the framework of DDIM inversion for editing, in this work, we investigate the approximation error in DDIM inversion and propose to disentangle the guidance scale for the source and target branches to reduce the error while keeping the original framework. Moreover, a better guidance scale (i.e., 0.5) than default settings can be derived theoretically. Experiments on PIE-Bench show that our proposal can improve the performance of DDIM inversion dramatically without sacrificing efficiency.