PanFormer: a Transformer Based Model for Pan-sharpening
This work addresses image enhancement for remote sensing applications, representing an incremental improvement by applying a Transformer architecture to a specific domain task.
The authors tackled the pan-sharpening problem of generating high-resolution multi-spectral images from low-resolution inputs and corresponding panchromatic images, proposing a Transformer-based model that outperforms existing CNN methods on GaoFen-2 and WorldView-3 datasets.
Pan-sharpening aims at producing a high-resolution (HR) multi-spectral (MS) image from a low-resolution (LR) multi-spectral (MS) image and its corresponding panchromatic (PAN) image acquired by a same satellite. Inspired by a new fashion in recent deep learning community, we propose a novel Transformer based model for pan-sharpening. We explore the potential of Transformer in image feature extraction and fusion. Following the successful development of vision transformers, we design a two-stream network with the self-attention to extract the modality-specific features from the PAN and MS modalities and apply a cross-attention module to merge the spectral and spatial features. The pan-sharpened image is produced from the enhanced fused features. Extensive experiments on GaoFen-2 and WorldView-3 images demonstrate that our Transformer based model achieves impressive results and outperforms many existing CNN based methods, which shows the great potential of introducing Transformer to the pan-sharpening task. Codes are available at https://github.com/zhysora/PanFormer.