CVJan 7, 2025

Exploring Iterative Manifold Constraint for Zero-shot Image Editing

arXiv:2501.03631v21 citationsh-index: 6
Originality Incremental advance
AI Analysis

This addresses a key challenge in image editing for AI applications, offering an incremental improvement over current methods.

The paper tackles the trade-off between editability and fidelity in text-driven image editing by proposing ZZEdit, a zero-shot method that uses an intermediate-inverted latent as a pivot and alternates denoising and inversion, achieving improved performance over existing 'inversion-then-editing' pipelines in diverse scenarios.

Editability and fidelity are two essential demands for text-driven image editing, which expects that the editing area should align with the target prompt and the rest remain unchanged separately. The current cutting-edge editing methods usually obey an "inversion-then-editing" pipeline, where the input image is inverted to an approximate Gaussian noise ${z}_T$, based on which a sampling process is conducted using the target prompt. Nevertheless, we argue that it is not a good choice to use a near-Gaussian noise as a pivot for further editing since it would bring plentiful fidelity errors. We verify this by a pilot analysis, discovering that intermediate-inverted latents can achieve a better trade-off between editability and fidelity than the fully-inverted ${z}_T$. Based on this, we propose a novel zero-shot editing paradigm dubbed ZZEdit, which first locates a qualified intermediate-inverted latent marked as ${z}_p$ as a better editing pivot, which is sufficient-for-editing while structure-preserving. Then, a ZigZag process is designed to execute denoising and inversion alternately, which progressively inject target guidance to ${z}_p$ while preserving the structure information of $p$ step. Afterwards, to achieve the same step number of inversion and denoising, we execute a pure sampling process under the target prompt. Essentially, our ZZEdit performs iterative manifold constraint between the manifold of $M_{p}$ and $M_{p-1}$, leading to fewer fidelity errors. Extensive experiments highlight the effectiveness of ZZEdit in diverse image editing scenarios compared with the "inversion-then-editing" pipeline.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes