IVCVApr 17, 2024

Training Transformer Models by Wavelet Losses Improves Quantitative and Visual Performance in Single Image Super-Resolution

arXiv:2404.11273v121 citationsh-index: 62024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Originality Incremental advance
AI Analysis

This work addresses the challenge of improving image quality in super-resolution for applications like medical imaging or photography, though it is incremental as it builds on existing hybrid transformer and wavelet loss methods.

The paper tackles the problem of limited global information capture and inadequate high-frequency detail preservation in Transformer-based single image super-resolution by introducing convolutional non-local sparse attention blocks and wavelet losses, achieving state-of-the-art PSNR results and superior visual performance on benchmark datasets.

Transformer-based models have achieved remarkable results in low-level vision tasks including image super-resolution (SR). However, early Transformer-based approaches that rely on self-attention within non-overlapping windows encounter challenges in acquiring global information. To activate more input pixels globally, hybrid attention models have been proposed. Moreover, training by solely minimizing pixel-wise RGB losses, such as L1, have been found inadequate for capturing essential high-frequency details. This paper presents two contributions: i) We introduce convolutional non-local sparse attention (NLSA) blocks to extend the hybrid transformer architecture in order to further enhance its receptive field. ii) We employ wavelet losses to train Transformer models to improve quantitative and subjective performance. While wavelet losses have been explored previously, showing their power in training Transformer-based SR models is novel. Our experimental results demonstrate that the proposed model provides state-of-the-art PSNR results as well as superior visual performance across various benchmark datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes