LGCVFeb 15, 2021

Translational Equivariance in Kernelizable Attention

arXiv:2102.07680v18 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of improving sample efficiency and robustness in vision tasks by enabling Transformers to replace Convolutional Neural Networks, though it is incremental as it builds on existing kernelizable attention methods.

The authors tackled the challenge of incorporating translational equivariance into efficient Transformers (Performers) to improve robustness to input shifts, resulting in significant enhancements in robustness compared to naive applications.

While Transformer architectures have show remarkable success, they are bound to the computation of all pairwise interactions of input element and thus suffer from limited scalability. Recent work has been successful by avoiding the computation of the complete attention matrix, yet leads to problems down the line. The absence of an explicit attention matrix makes the inclusion of inductive biases relying on relative interactions between elements more challenging. An extremely powerful inductive bias is translational equivariance, which has been conjectured to be responsible for much of the success of Convolutional Neural Networks on image recognition tasks. In this work we show how translational equivariance can be implemented in efficient Transformers based on kernelizable attention - Performers. Our experiments highlight that the devised approach significantly improves robustness of Performers to shifts of input images compared to their naive application. This represents an important step on the path of replacing Convolutional Neural Networks with more expressive Transformer architectures and will help to improve sample efficiency and robustness in this realm.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes