CVAIJun 26, 2025

DFVEdit: Conditional Delta Flow Vector for Zero-shot Video Editing

arXiv:2506.20967v22 citationsh-index: 17
Originality Highly original
AI Analysis

This work addresses efficiency issues in video editing for AI researchers and practitioners, offering a novel method that is incremental in improving existing techniques.

The paper tackled the problem of high computational overhead in video editing for Video Diffusion Transformers by proposing DFVEdit, an efficient zero-shot method that eliminates attention modification and fine-tuning, achieving at least 20x inference speed-up and 85% memory reduction while attaining state-of-the-art performance.

The advent of Video Diffusion Transformers (Video DiTs) marks a milestone in video generation. However, directly applying existing video editing methods to Video DiTs often incurs substantial computational overhead, due to resource-intensive attention modification or finetuning. To alleviate this problem, we present DFVEdit, an efficient zero-shot video editing method tailored for Video DiTs. DFVEdit eliminates the need for both attention modification and fine-tuning by directly operating on clean latents via flow transformation. To be more specific, we observe that editing and sampling can be unified under the continuous flow perspective. Building upon this foundation, we propose the Conditional Delta Flow Vector (CDFV) -- a theoretically unbiased estimation of DFV -- and integrate Implicit Cross Attention (ICA) guidance as well as Embedding Reinforcement (ER) to further enhance editing quality. DFVEdit excels in practical efficiency, offering at least 20x inference speed-up and 85% memory reduction on Video DiTs compared to attention-engineering-based editing methods. Extensive quantitative and qualitative experiments demonstrate that DFVEdit can be seamlessly applied to popular Video DiTs (e.g., CogVideoX and Wan2.1), attaining state-of-the-art performance on structural fidelity, spatial-temporal consistency, and editing quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes