CVJul 11, 2024

Generalizable Implicit Motion Modeling for Video Frame Interpolation

arXiv:2407.08680v518 citationsh-index: 24
Originality Incremental advance
AI Analysis

This addresses the challenge of effectively modeling spatiotemporal dynamics in real-world videos for video frame interpolation, representing an incremental improvement over existing flow-based methods.

The paper tackles the problem of motion modeling in video frame interpolation by introducing Generalizable Implicit Motion Modeling (GIMM), which uses a motion encoding pipeline and adaptive neural network to predict optical flows, achieving state-of-the-art performance on standard benchmarks.

Motion modeling is critical in flow-based Video Frame Interpolation (VFI). Existing paradigms either consider linear combinations of bidirectional flows or directly predict bilateral flows for given timestamps without exploring favorable motion priors, thus lacking the capability of effectively modeling spatiotemporal dynamics in real-world videos. To address this limitation, in this study, we introduce Generalizable Implicit Motion Modeling (GIMM), a novel and effective approach to motion modeling for VFI. Specifically, to enable GIMM as an effective motion modeling paradigm, we design a motion encoding pipeline to model spatiotemporal motion latent from bidirectional flows extracted from pre-trained flow estimators, effectively representing input-specific motion priors. Then, we implicitly predict arbitrary-timestep optical flows within two adjacent input frames via an adaptive coordinate-based neural network, with spatiotemporal coordinates and motion latent as inputs. Our GIMM can be easily integrated with existing flow-based VFI works by supplying accurately modeled motion. We show that GIMM performs better than the current state of the art on standard VFI benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes