A Lightweight Recurrent Grouping Attention Network for Video Super-Resolution
This work addresses the need for efficient video super-resolution models to reduce device stress, though it appears incremental as it builds on existing recurrent and attention mechanisms.
The authors tackled the problem of high computational demand in video super-resolution by proposing a lightweight recurrent grouping attention network with only 0.878M parameters, achieving state-of-the-art performance on multiple datasets.
Effective aggregation of temporal information of consecutive frames is the core of achieving video super-resolution. Many scholars have utilized structures such as sliding windows and recurrent to gather spatio-temporal information of frames. However, although the performance of the constructed VSR models is improving, the size of the models is also increasing, exacerbating the demand on the equipment. Thus, to reduce the stress on the device, we propose a novel lightweight recurrent grouping attention network. The parameters of this model are only 0.878M, which is much lower than the current mainstream model for studying video super-resolution. We design forward feature extraction module and backward feature extraction module to collect temporal information between consecutive frames from two directions. Moreover, a new grouping mechanism is proposed to efficiently collect spatio-temporal information of the reference frame and its neighboring frames. The attention supplementation module is presented to further enhance the information gathering range of the model. The feature reconstruction module aims to aggregate information from different directions to reconstruct high-resolution features. Experiments demonstrate that our model achieves state-of-the-art performance on multiple datasets.