CVJul 23, 2025

DiNAT-IR: Exploring Dilated Neighborhood Attention for High-Quality Image Restoration

arXiv:2507.17892v11 citationsh-index: 5
Originality Incremental advance
AI Analysis

This work addresses efficiency-quality trade-offs in image restoration for low-level computer vision applications, but it is incremental as it builds on existing attention mechanisms.

The paper tackled the challenge of high computational cost in transformers for image restoration by proposing DiNAT-IR, which uses dilated neighborhood attention with a channel-aware module to balance global context and local precision, achieving competitive results on multiple benchmarks.

Transformers, with their self-attention mechanisms for modeling long-range dependencies, have become a dominant paradigm in image restoration tasks. However, the high computational cost of self-attention limits scalability to high-resolution images, making efficiency-quality trade-offs a key research focus. To address this, Restormer employs channel-wise self-attention, which computes attention across channels instead of spatial dimensions. While effective, this approach may overlook localized artifacts that are crucial for high-quality image restoration. To bridge this gap, we explore Dilated Neighborhood Attention (DiNA) as a promising alternative, inspired by its success in high-level vision tasks. DiNA balances global context and local precision by integrating sliding-window attention with mixed dilation factors, effectively expanding the receptive field without excessive overhead. However, our preliminary experiments indicate that directly applying this global-local design to the classic deblurring task hinders accurate visual restoration, primarily due to the constrained global context understanding within local attention. To address this, we introduce a channel-aware module that complements local attention, effectively integrating global context without sacrificing pixel-level precision. The proposed DiNAT-IR, a Transformer-based architecture specifically designed for image restoration, achieves competitive results across multiple benchmarks, offering a high-quality solution for diverse low-level computer vision problems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes