CVMay 5, 2021

Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks

arXiv:2105.02358v2689 citations
AI Analysis

This addresses efficiency and scalability issues for researchers and practitioners using attention-based models in computer vision, though it is an incremental improvement over existing attention mechanisms.

The paper tackles the quadratic complexity and sample-ignorance of self-attention in visual tasks by proposing external attention, a linear-complexity mechanism using two linear layers and shared memories, achieving comparable or superior results in tasks like image classification and object detection with lower computational costs.

Attention mechanisms, especially self-attention, have played an increasingly important role in deep feature representation for visual tasks. Self-attention updates the feature at each position by computing a weighted sum of features using pair-wise affinities across all positions to capture the long-range dependency within a single sample. However, self-attention has quadratic complexity and ignores potential correlation between different samples. This paper proposes a novel attention mechanism which we call external attention, based on two external, small, learnable, shared memories, which can be implemented easily by simply using two cascaded linear layers and two normalization layers; it conveniently replaces self-attention in existing popular architectures. External attention has linear complexity and implicitly considers the correlations between all data samples. We further incorporate the multi-head mechanism into external attention to provide an all-MLP architecture, external attention MLP (EAMLP), for image classification. Extensive experiments on image classification, object detection, semantic segmentation, instance segmentation, image generation, and point cloud analysis reveal that our method provides results comparable or superior to the self-attention mechanism and some of its variants, with much lower computational and memory costs.

Code Implementations7 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes