CVGRLGNov 21, 2024

Stable Flow: Vital Layers for Training-Free Image Editing

arXiv:2411.14430v289 citationsh-index: 30CVPR
Originality Incremental advance
AI Analysis

This work addresses the problem of controlled image editing for users of diffusion models, offering a training-free approach that is incremental by building on existing DiT and flow-matching techniques.

The paper tackled the limited generation diversity in Diffusion Transformer (DiT) models by proposing an automatic method to identify 'vital layers' for selective attention feature injection, enabling consistent image edits such as non-rigid modifications and object addition, with evaluation showing effectiveness through qualitative, quantitative comparisons, and a user study.

Diffusion models have revolutionized the field of content synthesis and editing. Recent models have replaced the traditional UNet architecture with the Diffusion Transformer (DiT), and employed flow-matching for improved training and sampling. However, they exhibit limited generation diversity. In this work, we leverage this limitation to perform consistent image edits via selective injection of attention features. The main challenge is that, unlike the UNet-based models, DiT lacks a coarse-to-fine synthesis structure, making it unclear in which layers to perform the injection. Therefore, we propose an automatic method to identify "vital layers" within DiT, crucial for image formation, and demonstrate how these layers facilitate a range of controlled stable edits, from non-rigid modifications to object addition, using the same mechanism. Next, to enable real-image editing, we introduce an improved image inversion method for flow models. Finally, we evaluate our approach through qualitative and quantitative comparisons, along with a user study, and demonstrate its effectiveness across multiple applications. The project page is available at https://omriavrahami.com/stable-flow

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes