CVDec 16, 2019

FISR: Deep Joint Frame Interpolation and Super-Resolution with a Multi-scale Temporal Loss

arXiv:1912.07213v271 citations
Originality Incremental advance
AI Analysis

This addresses the need for realistic high-quality video up-conversion in broadcasting and display applications, though it is incremental as it builds on existing VFI and SR methods.

The paper tackles the problem of up-converting legacy videos to higher spatio-temporal resolutions (e.g., 2K 30 fps to 4K 60 fps) by proposing a joint framework for video frame interpolation and super-resolution, using a multi-scale temporal loss to reduce motion artifacts and improve video quality.

Super-resolution (SR) has been widely used to convert low-resolution legacy videos to high-resolution (HR) ones, to suit the increasing resolution of displays (e.g. UHD TVs). However, it becomes easier for humans to notice motion artifacts (e.g. motion judder) in HR videos being rendered on larger-sized display devices. Thus, broadcasting standards support higher frame rates for UHD (Ultra High Definition) videos (4K@60 fps, 8K@120 fps), meaning that applying SR only is insufficient to produce genuine high quality videos. Hence, to up-convert legacy videos for realistic applications, not only SR but also video frame interpolation (VFI) is necessitated. In this paper, we first propose a joint VFI-SR framework for up-scaling the spatio-temporal resolution of videos from 2K 30 fps to 4K 60 fps. For this, we propose a novel training scheme with a multi-scale temporal loss that imposes temporal regularization on the input video sequence, which can be applied to any general video-related task. The proposed structure is analyzed in depth with extensive experiments.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes