CV AIMay 15

Tuning-free Instruction-based Video Editing Via Structural Noise Initialization and Guidance

Song Wu, Xinyu Chen, Qian Wang, Liang Li, Zili Yi, Junlan Feng

arXiv:2605.1553367.5

AI Analysis

It addresses the challenge of instruction-based video editing without fine-tuning, offering a practical solution for content creators.

This paper proposes a tuning-free, instruction-based video editing framework that uses a Structural Noise Initialization Strategy and Noise Guidance Mechanism to improve editing quality and consistency, achieving state-of-the-art performance.

Video editing poses a significant challenge. While a series of tuning-free methods circumvent the need for extensive data collection and model training, they often underutilize the rich information embedded within noisy latent, leading to unsatisfactory results. To address this, we propose a \textit{tuning-free, instruction-based} video editing framework. We approach video editing from the perspective of noisy latent: we design a Structural Noise Initialization Strategy (SNIS) to secure a superior editing starting point by assigning higher noise levels to edited regions (to facilitate content change) and lower noise levels to unedited regions (to maintain content consistency). We introduce a Noise Guidance Mechanism (NGM), which leverages the video prior in the generative model and effectively integrates rich information within the noisy latent to guide the denoising process, thereby preserving unedited content and overall visual coherence. Experiments show that our proposed method achieves better visual quality and state-of-the-art performance.

View on arXiv PDF

Similar