Group Equivariant Stand-Alone Self-Attention For Vision
This work addresses the need for more efficient and effective symmetry-aware models in computer vision, though it is incremental as it builds on existing self-attention frameworks.
The paper tackled the problem of incorporating group equivariance into self-attention for vision tasks by defining invariant positional encodings, resulting in consistent improvements over non-equivariant self-attention networks on vision benchmarks.
We provide a general self-attention formulation to impose group equivariance to arbitrary symmetry groups. This is achieved by defining positional encodings that are invariant to the action of the group considered. Since the group acts on the positional encoding directly, group equivariant self-attention networks (GSA-Nets) are steerable by nature. Our experiments on vision benchmarks demonstrate consistent improvements of GSA-Nets over non-equivariant self-attention networks.