Plug-and-Play Linear Attention for Pre-trained Image and Video Restoration Models
This addresses efficiency issues for real-time and resource-constrained environments in computer vision, though it is incremental as it builds on existing transformer architectures.
The paper tackles the computational bottleneck of multi-head self-attention in image and video restoration models by proposing PnP-Nystra, a plug-and-play linear attention module that achieves 2-5x speed-up on GPU and CPU with a maximum PSNR drop of only 1.5 dB across tasks like denoising and super-resolution.
Multi-head self-attention (MHSA) has become a core component in modern computer vision models. However, its quadratic complexity with respect to input length poses a significant computational bottleneck in real-time and resource constrained environments. We propose PnP-Nystra, a Nyström based linear approximation of self-attention, developed as a plug-and-play (PnP) module that can be integrated into the pre-trained image and video restoration models without retraining. As a drop-in replacement for MHSA, PnP-Nystra enables efficient acceleration in various window-based transformer architectures, including SwinIR, Uformer, and RVRT. Our experiments across diverse image and video restoration tasks, including denoising, deblurring, and super-resolution, demonstrate that PnP-Nystra achieves a 2-4x speed-up on an NVIDIA RTX 4090 GPU and a 2-5x speed-up on CPU inference. Despite these significant gains, the method incurs a maximum PSNR drop of only 1.5 dB across all evaluated tasks. To the best of our knowledge, we are the first to demonstrate a linear attention functioning as a training-free substitute for MHSA in restoration models.