FRAG: Frequency Adapting Group for Diffusion Video Editing
This work addresses video editing quality issues for users of diffusion models, but it is incremental as it builds on existing systems with a novel component.
The paper tackles the problem of quality deterioration like blurring and flickering in diffusion video editing by addressing high-frequency leak during denoising, resulting in improved consistency and fidelity as validated on benchmarks such as TGVE and DAVIS.
In video editing, the hallmark of a quality edit lies in its consistent and unobtrusive adjustment. Modification, when integrated, must be smooth and subtle, preserving the natural flow and aligning seamlessly with the original vision. Therefore, our primary focus is on overcoming the current challenges in high quality edit to ensure that each edit enhances the final product without disrupting its intended essence. However, quality deterioration such as blurring and flickering is routinely observed in recent diffusion video editing systems. We confirm that this deterioration often stems from high-frequency leak: the diffusion model fails to accurately synthesize high-frequency components during denoising process. To this end, we devise Frequency Adapting Group (FRAG) which enhances the video quality in terms of consistency and fidelity by introducing a novel receptive field branch to preserve high-frequency components during the denoising process. FRAG is performed in a model-agnostic manner without additional training and validates the effectiveness on video editing benchmarks (i.e., TGVE, DAVIS).