Self-Supervised training for blind multi-frame video denoising
This addresses the challenge of denoising videos with unknown noise distributions for applications in video processing, though it is incremental as it builds on existing multi-frame denoising networks.
The paper tackles the problem of blind multi-frame video denoising by proposing a self-supervised training approach that fine-tunes pre-trained networks for unknown noise types from a single video, achieving performance that reaches or surpasses supervised state-of-the-art methods after a few frames.
We propose a self-supervised approach for training multi-frame video denoising networks. These networks predict frame t from a window of frames around t. Our self-supervised approach benefits from the video temporal consistency by penalizing a loss between the predicted frame t and a neighboring target frame, which are aligned using an optical flow. We use the proposed strategy for online internal learning, where a pre-trained network is fine-tuned to denoise a new unknown noise type from a single video. After a few frames, the proposed fine-tuning reaches and sometimes surpasses the performance of a state-of-the-art network trained with supervision. In addition, for a wide range of noise types, it can be applied blindly without knowing the noise distribution. We demonstrate this by showing results on blind denoising of different synthetic and realistic noises.