HiTSR: A Hierarchical Transformer for Reference-based Super-Resolution
This work addresses the problem of improving image quality in super-resolution tasks for applications like photography or medical imaging, but it is incremental as it builds on existing attention mechanisms and transformer architectures.
The paper tackles reference-based image super-resolution by proposing HiTSR, a hierarchical transformer model that learns matching correspondences from high-resolution reference images to enhance low-resolution inputs, achieving state-of-the-art results with PSNR/SSIM values of 30.24/0.821 on the SUN80 dataset.
In this paper, we propose HiTSR, a hierarchical transformer model for reference-based image super-resolution, which enhances low-resolution input images by learning matching correspondences from high-resolution reference images. Diverging from existing multi-network, multi-stage approaches, we streamline the architecture and training pipeline by incorporating the double attention block from GAN literature. Processing two visual streams independently, we fuse self-attention and cross-attention blocks through a gating attention strategy. The model integrates a squeeze-and-excitation module to capture global context from the input images, facilitating long-range spatial interactions within window-based attention blocks. Long skip connections between shallow and deep layers further enhance information flow. Our model demonstrates superior performance across three datasets including SUN80, Urban100, and Manga109. Specifically, on the SUN80 dataset, our model achieves PSNR/SSIM values of 30.24/0.821. These results underscore the effectiveness of attention mechanisms in reference-based image super-resolution. The transformer-based model attains state-of-the-art results without the need for purpose-built subnetworks, knowledge distillation, or multi-stage training, emphasizing the potency of attention in meeting reference-based image super-resolution requirements.