CVDec 26, 2025

ProEdit: Inversion-based Editing From Prompts Done Right

arXiv:2512.22118v13 citationsh-index: 12
Originality Incremental advance
AI Analysis

This addresses a specific bottleneck in training-free visual editing for users needing precise image modifications, though it is incremental as it builds on existing inversion methods.

The paper tackles the problem of inversion-based visual editing, where existing methods overly rely on source image information, negatively affecting edits like pose or color changes; the proposed ProEdit method achieves state-of-the-art performance on image and video editing benchmarks.

Inversion-based visual editing provides an effective and training-free way to edit an image or a video based on user instructions. Existing methods typically inject source image information during the sampling process to maintain editing consistency. However, this sampling strategy overly relies on source information, which negatively affects the edits in the target image (e.g., failing to change the subject's atributes like pose, number, or color as instructed). In this work, we propose ProEdit to address this issue both in the attention and the latent aspects. In the attention aspect, we introduce KV-mix, which mixes KV features of the source and the target in the edited region, mitigating the influence of the source image on the editing region while maintaining background consistency. In the latent aspect, we propose Latents-Shift, which perturbs the edited region of the source latent, eliminating the influence of the inverted latent on the sampling. Extensive experiments on several image and video editing benchmarks demonstrate that our method achieves SOTA performance. In addition, our design is plug-and-play, which can be seamlessly integrated into existing inversion and editing methods, such as RF-Solver, FireFlow and UniEdit.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes