CVJan 5, 2023

Skip-Attention: Improving Vision Transformers by Paying Less Attention

arXiv:2301.02240v244 citationsh-index: 81
AI Analysis

This addresses efficiency for vision transformer users, but it is incremental as it builds on existing methods to reduce redundancy.

The paper tackles the computational inefficiency of vision transformers by identifying redundancy in self-attention across layers and proposes SkipAt to reuse attention computations, achieving improved throughput with same-or-higher accuracy in tasks like image classification and segmentation.

This work aims to improve the efficiency of vision transformers (ViT). While ViTs use computationally expensive self-attention operations in every layer, we identify that these operations are highly correlated across layers -- a key redundancy that causes unnecessary computations. Based on this observation, we propose SkipAt, a method to reuse self-attention computation from preceding layers to approximate attention at one or more subsequent layers. To ensure that reusing self-attention blocks across layers does not degrade the performance, we introduce a simple parametric function, which outperforms the baseline transformer's performance while running computationally faster. We show the effectiveness of our method in image classification and self-supervised learning on ImageNet-1K, semantic segmentation on ADE20K, image denoising on SIDD, and video denoising on DAVIS. We achieve improved throughput at the same-or-higher accuracy levels in all these tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes