Frame Interpolation with Multi-Scale Deep Loss Functions and Generative Adversarial Networks
This work addresses video frame interpolation for computer vision applications, offering a significant speed improvement over existing methods.
The paper tackles frame interpolation by introducing a multi-scale generative adversarial network (FIGAN) that improves both accuracy and runtime. It achieves state-of-the-art accuracy and visual quality comparable to the best method while running 47 times faster.
Frame interpolation attempts to synthesise frames given one or more consecutive video frames. In recent years, deep learning approaches, and notably convolutional neural networks, have succeeded at tackling low- and high-level computer vision problems including frame interpolation. These techniques often tackle two problems, namely algorithm efficiency and reconstruction quality. In this paper, we present a multi-scale generative adversarial network for frame interpolation (\mbox{FIGAN}). To maximise the efficiency of our network, we propose a novel multi-scale residual estimation module where the predicted flow and synthesised frame are constructed in a coarse-to-fine fashion. To improve the quality of synthesised intermediate video frames, our network is jointly supervised at different levels with a perceptual loss function that consists of an adversarial and two content losses. We evaluate the proposed approach using a collection of 60fps videos from YouTube-8m. Our results improve the state-of-the-art accuracy and provide subjective visual quality comparable to the best performing interpolation method at x47 faster runtime.