Accurate Image Restoration with Attention Retractable Transformer
This work addresses a computational bottleneck in image restoration for applications requiring high-quality visual outputs, representing an incremental improvement over existing methods.
The authors tackled the problem of restricted receptive fields in Transformer-based image restoration networks by proposing the Attention Retractable Transformer (ART), which integrates dense and sparse attention modules to enhance representation and achieve state-of-the-art performance on tasks like super-resolution, denoising, and JPEG artifact reduction.
Recently, Transformer-based image restoration networks have achieved promising improvements over convolutional neural networks due to parameter-independent global interactions. To lower computational cost, existing works generally limit self-attention computation within non-overlapping windows. However, each group of tokens are always from a dense area of the image. This is considered as a dense attention strategy since the interactions of tokens are restrained in dense regions. Obviously, this strategy could result in restricted receptive fields. To address this issue, we propose Attention Retractable Transformer (ART) for image restoration, which presents both dense and sparse attention modules in the network. The sparse attention module allows tokens from sparse areas to interact and thus provides a wider receptive field. Furthermore, the alternating application of dense and sparse attention modules greatly enhances representation ability of Transformer while providing retractable attention on the input image.We conduct extensive experiments on image super-resolution, denoising, and JPEG compression artifact reduction tasks. Experimental results validate that our proposed ART outperforms state-of-the-art methods on various benchmark datasets both quantitatively and visually. We also provide code and models at https://github.com/gladzhang/ART.