FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models
This addresses the challenge of seamless text-based image editing across different model architectures for users in computer vision and generative AI, though it is incremental as it builds on existing flow models.
The paper tackles the problem of editing real images using pre-trained text-to-image flow models without requiring inversion or optimization, introducing FlowEdit as a method that constructs an ODE to map between source and target distributions with lower transport cost, achieving state-of-the-art results on models like Stable Diffusion 3 and FLUX.
Editing real images using a pre-trained text-to-image (T2I) diffusion/flow model often involves inverting the image into its corresponding noise map. However, inversion by itself is typically insufficient for obtaining satisfactory results, and therefore many methods additionally intervene in the sampling process. Such methods achieve improved results but are not seamlessly transferable between model architectures. Here, we introduce FlowEdit, a text-based editing method for pre-trained T2I flow models, which is inversion-free, optimization-free and model agnostic. Our method constructs an ODE that directly maps between the source and target distributions (corresponding to the source and target text prompts) and achieves a lower transport cost than the inversion approach. This leads to state-of-the-art results, as we illustrate with Stable Diffusion 3 and FLUX. Code and examples are available on the project's webpage.