CVApr 9, 2024

Efficient Concertormer for Image Deblurring and Beyond

Pin-Hung Kuo, Jinshan Pan, Shao-Yi Chien, Ming-Hsuan Yang

arXiv:2404.06135v32.0h-index: 3

Originality Highly original

AI Analysis

This addresses the problem of unaffordable computational costs for high-resolution image processing tasks like deblurring, offering a more efficient solution for researchers and practitioners in computer vision.

The paper tackles the high computational cost of Transformers in high-resolution vision tasks by introducing Concertormer with a novel Concerto Self-Attention mechanism, achieving linear complexity and performing favorably against state-of-the-art methods in image deblurring and related tasks.

The Transformer architecture has achieved remarkable success in natural language processing and high-level vision tasks over the past few years. However, the inherent complexity of self-attention is quadratic to the size of the image, leading to unaffordable computational costs for high-resolution vision tasks. In this paper, we introduce Concertormer, featuring a novel Concerto Self-Attention (CSA) mechanism designed for image deblurring. The proposed CSA divides self-attention into two distinct components: one emphasizes generally global and another concentrates on specifically local correspondence. By retaining partial information in additional dimensions independent from the self-attention calculations, our method effectively captures global contextual representations with complexity linear to the image size. To effectively leverage the additional dimensions, we present a Cross-Dimensional Communication module, which linearly combines attention maps and thus enhances expressiveness. Moreover, we amalgamate the two-staged Transformer design into a single stage using the proposed gated-dconv MLP architecture. While our primary objective is single-image motion deblurring, extensive quantitative and qualitative evaluations demonstrate that our approach performs favorably against the state-of-the-art methods in other tasks, such as deraining and deblurring with JPEG artifacts. The source codes and trained models will be made available to the public.

View on arXiv PDF

Similar