CVSep 26, 2025

FlashEdit: Decoupling Speed, Structure, and Semantics for Precise Image Editing

Junyi Wu, Zhiteng Li, Haotong Qin, Xiaohong Liu, Linghe Kong, Yulun Zhang, Xiaokang Yang

arXiv:2509.22244v32 citationsh-index: 6Has Code

Originality Highly original

AI Analysis

This addresses the prohibitive latency issue for real-world applications of image editing, offering a significant speed improvement over prior methods.

FlashEdit tackles the high latency of text-guided image editing with diffusion models by introducing a framework that achieves real-time editing in under 0.2 seconds, providing a 150x speedup while maintaining background consistency and structural integrity.

Text-guided image editing with diffusion models has achieved remarkable quality but suffers from prohibitive latency, hindering real-world applications. We introduce FlashEdit, a novel framework designed to enable high-fidelity, real-time image editing. Its efficiency stems from three key innovations: (1) a One-Step Inversion-and-Editing (OSIE) pipeline that bypasses costly iterative processes; (2) a Background Shield (BG-Shield) technique that guarantees background preservation by selectively modifying features only within the edit region; and (3) a Sparsified Spatial Cross-Attention (SSCA) mechanism that ensures precise, localized edits by suppressing semantic leakage to the background. Extensive experiments demonstrate that FlashEdit maintains superior background consistency and structural integrity, while performing edits in under 0.2 seconds, which is an over 150$\times$ speedup compared to prior multi-step methods. Our code will be made publicly available at https://github.com/JunyiWuCode/FlashEdit.

View on arXiv PDF Code

Similar