CV AI LG MMNov 15, 2024

A Low-Resolution Image is Worth 1x1 Words: Enabling Fine Image Super-Resolution with Transformers and TaylorShift

Sanath Budakegowdanadoddi Nagaraju, Brian Bernhard Moser, Tobias Christian Nauen, Stanislav Frolov, Federico Raue, Andreas Dengel

arXiv:2411.10231v22.0h-index: 13

Originality Highly original

AI Analysis

This work addresses efficiency and detail restoration in image super-resolution, which is important for applications like medical imaging or photography, though it appears incremental as it builds on existing transformer architectures.

The paper tackles the scalability limitations of transformer-based super-resolution models by proposing TaylorIR, which uses 1x1 patch embeddings and a TaylorShift attention mechanism to achieve state-of-the-art performance while reducing memory consumption by up to 60%.

Transformer-based architectures have recently advanced the image reconstruction quality of super-resolution (SR) models. Yet, their scalability remains limited by quadratic attention costs and coarse patch embeddings that weaken pixel-level fidelity. We propose TaylorIR, a plug-and-play framework that enforces 1x1 patch embeddings for true pixel-wise reasoning and replaces conventional self-attention with TaylorShift, a Taylor-series-based attention mechanism enabling full token interactions with near-linear complexity. Across multiple SR benchmarks, TaylorIR delivers state-of-the-art performance while reducing memory consumption by up to 60%, effectively bridging the gap between fine-grained detail restoration and efficient transformer scaling.

View on arXiv PDF

Similar