Video Frame Interpolation with Region-Distinguishable Priors from SAM
This work addresses accuracy issues in video frame interpolation for applications like video editing and slow-motion generation, representing an incremental improvement by integrating segmentation priors into existing motion-based methods.
The paper tackles the challenge of motion estimation ambiguity in Video Frame Interpolation (VFI) by using Segment Anything Model (SAM) to generate Region-Distinguishable Priors (RDPs) as spatial-varying Gaussian mixtures, which improve feature similarity for matched regions and enhance intermediate frame synthesis, with experiments showing consistent performance gains across scenes.
In existing Video Frame Interpolation (VFI) approaches, the motion estimation between neighboring frames plays a crucial role. However, the estimation accuracy in existing methods remains a challenge, primarily due to the inherent ambiguity in identifying corresponding areas in adjacent frames for interpolation. Therefore, enhancing accuracy by distinguishing different regions before motion estimation is of utmost importance. In this paper, we introduce a novel solution involving the utilization of open-world segmentation models, e.g., SAM (Segment Anything Model), to derive Region-Distinguishable Priors (RDPs) in different frames. These RDPs are represented as spatial-varying Gaussian mixtures, distinguishing an arbitrary number of areas with a unified modality. RDPs can be integrated into existing motion-based VFI methods to enhance features for motion estimation, facilitated by our designed play-and-plug Hierarchical Region-aware Feature Fusion Module (HRFFM). HRFFM incorporates RDP into various hierarchical stages of VFI's encoder, using RDP-guided Feature Normalization (RDPFN) in a residual learning manner. With HRFFM and RDP, the features within VFI's encoder exhibit similar representations for matched regions in neighboring frames, thus improving the synthesis of intermediate frames. Extensive experiments demonstrate that HRFFM consistently enhances VFI performance across various scenes.