CVApr 21, 2024

Motion-aware Latent Diffusion Models for Video Frame Interpolation

arXiv:2404.13534v324 citationsh-index: 14MM
Originality Incremental advance
AI Analysis

This addresses video frame interpolation for video generation frameworks, offering improved visual quality in dynamic scenarios, but it is incremental as it builds on existing diffusion models.

The paper tackles the problem of inaccurate motion estimation in video frame interpolation, which causes blurred results, by proposing a motion-aware latent diffusion model (MADiff) that progressively refines frames using motion priors, achieving state-of-the-art performance on benchmark datasets.

With the advancement of AIGC, video frame interpolation (VFI) has become a crucial component in existing video generation frameworks, attracting widespread research interest. For the VFI task, the motion estimation between neighboring frames plays a crucial role in avoiding motion ambiguity. However, existing VFI methods always struggle to accurately predict the motion information between consecutive frames, and this imprecise estimation leads to blurred and visually incoherent interpolated frames. In this paper, we propose a novel diffusion framework, motion-aware latent diffusion models (MADiff), which is specifically designed for the VFI task. By incorporating motion priors between the conditional neighboring frames with the target interpolated frame predicted throughout the diffusion sampling procedure, MADiff progressively refines the intermediate outcomes, culminating in generating both visually smooth and realistic results. Extensive experiments conducted on benchmark datasets demonstrate that our method achieves state-of-the-art performance significantly outperforming existing approaches, especially under challenging scenarios involving dynamic textures with complex motion.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes