CVDec 10, 2024

Repurposing Pre-trained Video Diffusion Models for Event-based Video Interpolation

Jingxi Chen, Brandon Y. Feng, Haoming Cai, Tianfu Wang, Levi Burner, Dehao Yuan, Cornelia Fermuller, Christopher A. Metzler, Yiannis Aloimonos

arXiv:2412.07761v216.422 citationsh-index: 53CVPR

Originality Incremental advance

AI Analysis

This work addresses the challenge of limited data in event-based video interpolation, which is important for applications in high-frame-rate video generation, though it is incremental as it repurposes existing models.

The paper tackles the problem of event-based video frame interpolation by adapting pre-trained video diffusion models to overcome limited training data, achieving superior performance and generalization across cameras compared to existing methods.

Video Frame Interpolation aims to recover realistic missing frames between observed frames, generating a high-frame-rate video from a low-frame-rate video. However, without additional guidance, the large motion between frames makes this problem ill-posed. Event-based Video Frame Interpolation (EVFI) addresses this challenge by using sparse, high-temporal-resolution event measurements as motion guidance. This guidance allows EVFI methods to significantly outperform frame-only methods. However, to date, EVFI methods have relied on a limited set of paired event-frame training data, severely limiting their performance and generalization capabilities. In this work, we overcome the limited data challenge by adapting pre-trained video diffusion models trained on internet-scale datasets to EVFI. We experimentally validate our approach on real-world EVFI datasets, including a new one that we introduce. Our method outperforms existing methods and generalizes across cameras far better than existing approaches.

View on arXiv PDF

Similar