Fast Spatio-Temporal Residual Network for Video Super-Resolution
This work addresses computational efficiency in video super-resolution, which is important for real-time applications, though it is incremental as it builds on existing deep learning methods.
The paper tackles the high computational complexity of using 3D convolutions for video super-resolution by proposing a fast spatio-temporal residual network (FSTRN) that reduces computational load while enhancing performance, achieving state-of-the-art results on benchmark datasets.
Recently, deep learning based video super-resolution (SR) methods have achieved promising performance. To simultaneously exploit the spatial and temporal information of videos, employing 3-dimensional (3D) convolutions is a natural approach. However, straight utilizing 3D convolutions may lead to an excessively high computational complexity which restricts the depth of video SR models and thus undermine the performance. In this paper, we present a novel fast spatio-temporal residual network (FSTRN) to adopt 3D convolutions for the video SR task in order to enhance the performance while maintaining a low computational load. Specifically, we propose a fast spatio-temporal residual block (FRB) that divide each 3D filter to the product of two 3D filters, which have considerably lower dimensions. Furthermore, we design a cross-space residual learning that directly links the low-resolution space and the high-resolution space, which can greatly relieve the computational burden on the feature fusion and up-scaling parts. Extensive evaluations and comparisons on benchmark datasets validate the strengths of the proposed approach and demonstrate that the proposed network significantly outperforms the current state-of-the-art methods.