First image then video: A two-stage network for spatiotemporal video denoising
This work addresses video denoising for applications like video processing and computer vision, but it is incremental as it builds on existing spatiotemporal methods with a simple two-stage modification.
The paper tackles motion blur artifacts in spatiotemporal video denoising by proposing a two-stage neural network that first denoises images spatially and then applies spatiotemporal processing, achieving state-of-the-art performance on the Vimeo90K dataset with improvements in both denoising quality and computation.
Video denoising is to remove noise from noise-corrupted data, thus recovering true signals via spatiotemporal processing. Existing approaches for spatiotemporal video denoising tend to suffer from motion blur artifacts, that is, the boundary of a moving object tends to appear blurry especially when the object undergoes a fast motion, causing optical flow calculation to break down. In this paper, we address this challenge by designing a first-image-then-video two-stage denoising neural network, consisting of an image denoising module for spatially reducing intra-frame noise followed by a regular spatiotemporal video denoising module. The intuition is simple yet powerful and effective: the first stage of image denoising effectively reduces the noise level and, therefore, allows the second stage of spatiotemporal denoising for better modeling and learning everywhere, including along the moving object boundaries. This two-stage network, when trained in an end-to-end fashion, yields the state-of-the-art performances on the video denoising benchmark Vimeo90K dataset in terms of both denoising quality and computation. It also enables an unsupervised approach that achieves comparable performance to existing supervised approaches.