CVDec 30, 2023

Image Super-Resolution Reconstruction Network based on Enhanced Swin Transformer via Alternating Aggregation of Local-Global Features

arXiv:2401.00241v51 citationsh-index: 4Has CodeIEEE Trans Emerg Top Comput Intell
Originality Incremental advance
AI Analysis

This work addresses image quality improvement for super-resolution applications, representing an incremental advancement over existing transformer-based methods.

The paper tackles the problem of image super-resolution by enhancing the Swin Transformer to better integrate local and global features, achieving higher PSNR scores, such as surpassing SRCNN by 2.17dB and other models by 0.1-0.13dB, as validated on five public datasets.

The Swin Transformer image super-resolution (SR) reconstruction network primarily depends on the long-range relationship of the window and shifted window attention to explore features. However, this approach focuses only on global features, ignoring local ones, and considers only spatial interactions, disregarding channel and spatial-channel feature interactions, limiting its nonlinear mapping capability. Therefore, this study proposes an enhanced Swin Transformer network (ESTN) that alternately aggregates local and global features. During local feature aggregation, shift convolution facilitates the interaction between local spatial and channel information. During global feature aggregation, a block sparse global perception module is introduced, wherein spatial information is reorganized and the recombined features are then processed by a dense layer to achieve global perception. Additionally, multiscale self-attention and low-parameter residual channel attention modules are introduced to aggregate information across different scales. Finally, the effectiveness of ESTN on five public datasets and a local attribution map (LAM) are analyzed. Experimental results demonstrate that the proposed ESTN achieves higher average PSNR, surpassing SRCNN, ELAN-light, SwinIR-light, and SMFANER+ models by 2.17dB, 0.13dB, 0.12dB, and 0.1dB, respectively, with LAM further confirming its larger receptive field. ESTN delivers improved quality of SR images. The source code can be found at https://github.com/huangyuming2021/ESTN.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes