CVJun 23, 2025

Enhancing Image Restoration Transformer via Adaptive Translation Equivariance

Peking U
arXiv:2506.18520v15 citationsh-index: 12
Originality Incremental advance
AI Analysis

This addresses a fundamental bottleneck in image restoration for computer vision applications, offering an incremental improvement over existing transformer methods.

The paper tackled the loss of translation equivariance in image restoration transformers, which harms training and generalization, by proposing an adaptive sliding indexing mechanism to efficiently combine local and global attention, resulting in a new network (TEAFormer) that shows improved effectiveness, convergence, and generalization across tasks.

Translation equivariance is a fundamental inductive bias in image restoration, ensuring that translated inputs produce translated outputs. Attention mechanisms in modern restoration transformers undermine this property, adversely impacting both training convergence and generalization. To alleviate this issue, we propose two key strategies for incorporating translation equivariance: slide indexing and component stacking. Slide indexing maintains operator responses at fixed positions, with sliding window attention being a notable example, while component stacking enables the arrangement of translation-equivariant operators in parallel or sequentially, thereby building complex architectures while preserving translation equivariance. However, these strategies still create a dilemma in model design between the high computational cost of self-attention and the fixed receptive field associated with sliding window attention. To address this, we develop an adaptive sliding indexing mechanism to efficiently select key-value pairs for each query, which are then concatenated in parallel with globally aggregated key-value pairs. The designed network, called the Translation Equivariance Adaptive Transformer (TEAFormer), is assessed across a variety of image restoration tasks. The results highlight its superiority in terms of effectiveness, training convergence, and generalization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes