CVJul 22, 2022

Spatio-Temporal Deformable Attention Network for Video Deblurring

arXiv:2207.10852v157 citationsh-index: 48
Originality Incremental advance
AI Analysis

This addresses blurry video frames for applications like video enhancement, but it is incremental as it builds on existing alignment-based methods.

The paper tackled video deblurring by proposing STDANet, which selectively uses sharp pixels based on blur levels, achieving favorable performance against state-of-the-art methods on GoPro, DVD, and BSD datasets.

The key success factor of the video deblurring methods is to compensate for the blurry pixels of the mid-frame with the sharp pixels of the adjacent video frames. Therefore, mainstream methods align the adjacent frames based on the estimated optical flows and fuse the alignment frames for restoration. However, these methods sometimes generate unsatisfactory results because they rarely consider the blur levels of pixels, which may introduce blurry pixels from video frames. Actually, not all the pixels in the video frames are sharp and beneficial for deblurring. To address this problem, we propose the spatio-temporal deformable attention network (STDANet) for video delurring, which extracts the information of sharp pixels by considering the pixel-wise blur levels of the video frames. Specifically, STDANet is an encoder-decoder network combined with the motion estimator and spatio-temporal deformable attention (STDA) module, where motion estimator predicts coarse optical flows that are used as base offsets to find the corresponding sharp pixels in STDA module. Experimental results indicate that the proposed STDANet performs favorably against state-of-the-art methods on the GoPro, DVD, and BSD datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes