Reciprocal Attention Mixing Transformer for Lightweight Image Restoration
This work addresses computational efficiency and feature integration issues in image restoration, which is important for applications requiring real-time or resource-constrained processing, though it appears incremental as it builds on existing Transformer and MobileNet architectures.
The authors tackled the problem of excessive parameters and limited receptive fields in Transformer-based image restoration by proposing RAMiT, a lightweight network that uses reciprocal attention mixing, achieving state-of-the-art performance on tasks like super-resolution and denoising.
Although many recent works have made advancements in the image restoration (IR) field, they often suffer from an excessive number of parameters. Another issue is that most Transformer-based IR methods focus only on either local or global features, leading to limited receptive fields or deficient parameter issues. To address these problems, we propose a lightweight IR network, Reciprocal Attention Mixing Transformer (RAMiT). It employs our proposed dimensional reciprocal attention mixing Transformer (D-RAMiT) blocks, which compute bi-dimensional (spatial and channel) self-attentions in parallel with different numbers of multi-heads. The bi-dimensional attentions help each other to complement their counterpart's drawbacks and are then mixed. Additionally, we introduce a hierarchical reciprocal attention mixing (H-RAMi) layer that compensates for pixel-level information losses and utilizes semantic information while maintaining an efficient hierarchical structure. Furthermore, we revisit and modify MobileNet V1 and V2 to attach efficient convolutions to our proposed components. The experimental results demonstrate that RAMiT achieves state-of-the-art performance on multiple lightweight IR tasks, including super-resolution, color denoising, grayscale denoising, low-light enhancement, and deraining. Codes are available at https://github.com/rami0205/RAMiT.