CVJan 18, 2024

Wavelet-Guided Acceleration of Text Inversion in Diffusion-Based Image Editing

arXiv:2401.09794v113 citationsICASSP
Originality Incremental advance
AI Analysis

This work addresses efficiency for users of diffusion-based image editing tools, though it is incremental as it builds on existing NTI and NPI concepts.

The paper tackles the slow speed of Null-text Inversion (NTI) for diffusion-based image editing, which takes over two minutes per image, by introducing a wavelet-guided method that reduces average editing time by over 80% while maintaining comparable performance.

In the field of image editing, Null-text Inversion (NTI) enables fine-grained editing while preserving the structure of the original image by optimizing null embeddings during the DDIM sampling process. However, the NTI process is time-consuming, taking more than two minutes per image. To address this, we introduce an innovative method that maintains the principles of the NTI while accelerating the image editing process. We propose the WaveOpt-Estimator, which determines the text optimization endpoint based on frequency characteristics. Utilizing wavelet transform analysis to identify the image's frequency characteristics, we can limit text optimization to specific timesteps during the DDIM sampling process. By adopting the Negative-Prompt Inversion (NPI) concept, a target prompt representing the original image serves as the initial text value for optimization. This approach maintains performance comparable to NTI while reducing the average editing time by over 80% compared to the NTI method. Our method presents a promising approach for efficient, high-quality image editing based on diffusion models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes