IVCVNov 13, 2025

From Attention to Frequency: Integration of Vision Transformer and FFT-ReLU for Enhanced Image Deblurring

arXiv:2511.10806v1h-index: 2
Originality Incremental advance
AI Analysis

This addresses image deblurring for computer vision applications, offering a practical and generalizable paradigm for real-world image restoration, though it appears incremental as it builds on existing methods like ViTs and frequency-domain techniques.

The paper tackled image deblurring by proposing a dual-domain architecture that integrates Vision Transformers with an FFT-ReLU module, achieving superior PSNR, SSIM, and perceptual quality compared to state-of-the-art models on benchmark datasets.

Image deblurring is vital in computer vision, aiming to recover sharp images from blurry ones caused by motion or camera shake. While deep learning approaches such as CNNs and Vision Transformers (ViTs) have advanced this field, they often struggle with complex or high-resolution blur and computational demands. We propose a new dual-domain architecture that unifies Vision Transformers with a frequency-domain FFT-ReLU module, explicitly bridging spatial attention modeling and frequency sparsity. In this structure, the ViT backbone captures local and global dependencies, while the FFT-ReLU component enforces frequency-domain sparsity to suppress blur-related artifacts and preserve fine details. Extensive experiments on benchmark datasets demonstrate that this architecture achieves superior PSNR, SSIM, and perceptual quality compared to state-of-the-art models. Both quantitative metrics, qualitative comparisons, and human preference evaluations confirm its effectiveness, establishing a practical and generalizable paradigm for real-world image restoration.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes