LGOct 28, 2020

Higher Order Linear Transformer

arXiv:2010.14816v1
Originality Synthesis-oriented
AI Analysis

This work addresses efficiency issues in transformer models for machine learning practitioners, but it is incremental as it builds on existing linear transformer methods.

The paper tackles the computational complexity of attention mechanisms by extending a linear transformer approach to a second-order approximation of softmax normalization, resulting in improved efficiency.

Following up on the linear transformer part of the article from Katharopoulos et al., that takes this idea from Shen et al., the trick that produces a linear complexity for the attention mechanism is re-used and extended to a second-order approximation of the softmax normalization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes