CVSep 28, 2023

KV Inversion: KV Embeddings Learning for Text-Conditioned Real Image Action Editing

arXiv:2309.16608v131 citationsh-index: 11
Originality Incremental advance
AI Analysis

This addresses a practical limitation in image editing for users needing precise action modifications, though it appears incremental as it builds on existing diffusion models.

The paper tackles the problem of text-conditioned real image action editing, where existing methods fail to produce results that match action semantics while preserving original image content, and proposes KV Inversion, which achieves satisfactory reconstruction and editing without training the Stable Diffusion model or using large-scale datasets.

Text-conditioned image editing is a recently emerged and highly practical task, and its potential is immeasurable. However, most of the concurrent methods are unable to perform action editing, i.e. they can not produce results that conform to the action semantics of the editing prompt and preserve the content of the original image. To solve the problem of action editing, we propose KV Inversion, a method that can achieve satisfactory reconstruction performance and action editing, which can solve two major problems: 1) the edited result can match the corresponding action, and 2) the edited object can retain the texture and identity of the original real image. In addition, our method does not require training the Stable Diffusion model itself, nor does it require scanning a large-scale dataset to perform time-consuming training.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes