RegionDrag: Fast Region-Based Image Editing with Diffusion Models
This work addresses inefficiencies in interactive image editing for users, offering a faster and more precise alternative to existing methods, though it is incremental as it builds on diffusion models and attention-swapping techniques.
The paper tackles the computational overhead and ambiguity in point-drag-based image editing by proposing RegionDrag, a region-based copy-and-paste method that completes edits in less than 2 seconds for 512x512 images, over 100x faster than DragDiffusion while improving accuracy and alignment with user intentions.
Point-drag-based image editing methods, like DragDiffusion, have attracted significant attention. However, point-drag-based approaches suffer from computational overhead and misinterpretation of user intentions due to the sparsity of point-based editing instructions. In this paper, we propose a region-based copy-and-paste dragging method, RegionDrag, to overcome these limitations. RegionDrag allows users to express their editing instructions in the form of handle and target regions, enabling more precise control and alleviating ambiguity. In addition, region-based operations complete editing in one iteration and are much faster than point-drag-based methods. We also incorporate the attention-swapping technique for enhanced stability during editing. To validate our approach, we extend existing point-drag-based datasets with region-based dragging instructions. Experimental results demonstrate that RegionDrag outperforms existing point-drag-based approaches in terms of speed, accuracy, and alignment with user intentions. Remarkably, RegionDrag completes the edit on an image with a resolution of 512x512 in less than 2 seconds, which is more than 100x faster than DragDiffusion, while achieving better performance. Project page: https://visual-ai.github.io/regiondrag.