CVDec 19, 2023

RealCraft: Attention Control as A Tool for Zero-Shot Consistent Video Editing

arXiv:2312.12635v42 citationsh-index: 25ICONIP
Originality Incremental advance
AI Analysis

It addresses the problem of maintaining structural consistency in video editing for real-world applications, representing an incremental improvement over existing methods.

The paper tackles the challenge of applying text-to-image models to real-world video editing by proposing RealCraft, an attention-control-based method that achieves localized shape-wise edits with enhanced temporal consistency, demonstrated on videos of up to 64 frames without additional information.

Even though large-scale text-to-image generative models show promising performance in synthesizing high-quality images, applying these models directly to image editing remains a significant challenge. This challenge is further amplified in video editing due to the additional dimension of time. This is especially the case for editing real-world videos as it necessitates maintaining a stable structural layout across frames while executing localized edits without disrupting the existing content. In this paper, we propose RealCraft, an attention-control-based method for zero-shot real-world video editing. By swapping cross-attention for new feature injection and relaxing spatial-temporal attention of the editing object, we achieve localized shape-wise edit along with enhanced temporal consistency. Our model directly uses Stable Diffusion and operates without the need for additional information. We showcase the proposed zero-shot attention-control-based method across a range of videos, demonstrating shape-wise, time-consistent and parameter-free editing in videos of up to 64 frames.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes