LASPA: Latent Spatial Alignment for Fast Training-free Single Image Editing
This provides a faster and more efficient solution for image editing, especially on mobile devices, though it is incremental as it builds on existing diffusion models.
The paper tackles the problem of computationally expensive textual editing of real images with diffusion models by introducing a training-free approach that uses latent spatial alignment to preserve details, achieving 62-71% user preference and better editing scores.
We present a novel, training-free approach for textual editing of real images using diffusion models. Unlike prior methods that rely on computationally expensive finetuning, our approach leverages LAtent SPatial Alignment (LASPA) to efficiently preserve image details. We demonstrate how the diffusion process is amenable to spatial guidance using a reference image, leading to semantically coherent edits. This eliminates the need for complex optimization and costly model finetuning, resulting in significantly faster editing compared to previous methods. Additionally, our method avoids the storage requirements associated with large finetuned models. These advantages make our approach particularly well-suited for editing on mobile devices and applications demanding rapid response times. While simple and fast, our method achieves 62-71\% preference in a user-study and significantly better model-based editing strength and image preservation scores.