A Low-Resolution Image is Worth 1x1 Words: Enabling Fine Image Super-Resolution with Transformers and TaylorShift
This work addresses efficiency and detail restoration in image super-resolution, which is important for applications like medical imaging or photography, though it appears incremental as it builds on existing transformer architectures.
The paper tackles the scalability limitations of transformer-based super-resolution models by proposing TaylorIR, which uses 1x1 patch embeddings and a TaylorShift attention mechanism to achieve state-of-the-art performance while reducing memory consumption by up to 60%.
Transformer-based architectures have recently advanced the image reconstruction quality of super-resolution (SR) models. Yet, their scalability remains limited by quadratic attention costs and coarse patch embeddings that weaken pixel-level fidelity. We propose TaylorIR, a plug-and-play framework that enforces 1x1 patch embeddings for true pixel-wise reasoning and replaces conventional self-attention with TaylorShift, a Taylor-series-based attention mechanism enabling full token interactions with near-linear complexity. Across multiple SR benchmarks, TaylorIR delivers state-of-the-art performance while reducing memory consumption by up to 60%, effectively bridging the gap between fine-grained detail restoration and efficient transformer scaling.