CVGRAug 1, 2024

TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models

arXiv:2408.00735v169 citationsh-index: 21
Originality Incremental advance
AI Analysis

This work addresses a bottleneck for users needing efficient text-based image editing with fast diffusion models, though it is incremental as it builds on existing 'edit-friendly' frameworks.

The paper tackled the challenge of applying text-based image editing to fast-sampling diffusion models, which often suffer from visual artifacts and weak edits. It proposed a shifted noise schedule to fix artifacts and a pseudo-guidance method to boost editing strength, enabling editing in as few as three steps.

Diffusion models have opened the path to a wide range of text-based image editing frameworks. However, these typically build on the multi-step nature of the diffusion backwards process, and adapting them to distilled, fast-sampling methods has proven surprisingly challenging. Here, we focus on a popular line of text-based editing frameworks - the ``edit-friendly'' DDPM-noise inversion approach. We analyze its application to fast sampling methods and categorize its failures into two classes: the appearance of visual artifacts, and insufficient editing strength. We trace the artifacts to mismatched noise statistics between inverted noises and the expected noise schedule, and suggest a shifted noise schedule which corrects for this offset. To increase editing strength, we propose a pseudo-guidance approach that efficiently increases the magnitude of edits without introducing new artifacts. All in all, our method enables text-based image editing with as few as three diffusion steps, while providing novel insights into the mechanisms behind popular text-based editing approaches.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes