Delta Velocity Rectified Flow for Text-to-Image Editing
This work addresses text-to-image editing for AI applications, offering an incremental improvement over prior methods like FlowEdit by providing a theoretical interpretation and enhanced performance.
The authors tackled the problem of over-smoothing artifacts in text-to-image editing by proposing Delta Velocity Rectified Flow (DVRF), an inversion-free framework that models velocity field discrepancies and uses a time-dependent shift term, achieving superior editing quality, fidelity, and controllability without architectural modifications.
We propose Delta Velocity Rectified Flow (DVRF), a novel inversion-free, path-aware editing framework within rectified flow models for text-to-image editing. DVRF is a distillation-based method that explicitly models the discrepancy between the source and target velocity fields in order to mitigate over-smoothing artifacts rampant in prior distillation sampling approaches. We further introduce a time-dependent shift term to push noisy latents closer to the target trajectory, enhancing the alignment with the target distribution. We theoretically demonstrate that when this shift is disabled, DVRF reduces to Delta Denoising Score, thereby bridging score-based diffusion optimization and velocity-based rectified-flow optimization. Moreover, when the shift term follows a linear schedule under rectified-flow dynamics, DVRF generalizes the Inversion-free method FlowEdit and provides a principled theoretical interpretation for it. Experimental results indicate that DVRF achieves superior editing quality, fidelity, and controllability while requiring no architectural modifications, making it efficient and broadly applicable to text-to-image editing tasks. Code is available at https://github.com/Harvard-AI-and-Robotics-Lab/DeltaVelocityRectifiedFlow.