CVJun 10, 2024

FRAG: Frequency Adapting Group for Diffusion Video Editing

Sunjae Yoon, Gwanhyeong Koo, Geonwoo Kim, Chang D. Yoo

arXiv:2406.06044v212.813 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses video editing quality issues for users of diffusion models, but it is incremental as it builds on existing systems with a novel component.

The paper tackles the problem of quality deterioration like blurring and flickering in diffusion video editing by addressing high-frequency leak during denoising, resulting in improved consistency and fidelity as validated on benchmarks such as TGVE and DAVIS.

In video editing, the hallmark of a quality edit lies in its consistent and unobtrusive adjustment. Modification, when integrated, must be smooth and subtle, preserving the natural flow and aligning seamlessly with the original vision. Therefore, our primary focus is on overcoming the current challenges in high quality edit to ensure that each edit enhances the final product without disrupting its intended essence. However, quality deterioration such as blurring and flickering is routinely observed in recent diffusion video editing systems. We confirm that this deterioration often stems from high-frequency leak: the diffusion model fails to accurately synthesize high-frequency components during denoising process. To this end, we devise Frequency Adapting Group (FRAG) which enhances the video quality in terms of consistency and fidelity by introducing a novel receptive field branch to preserve high-frequency components during the denoising process. FRAG is performed in a model-agnostic manner without additional training and validates the effectiveness on video editing benchmarks (i.e., TGVE, DAVIS).

View on arXiv PDF Code

Similar