LGOct 28, 2020

Higher Order Linear Transformer

arXiv:2010.14816v11.2

Originality Synthesis-oriented

AI Analysis

This work addresses efficiency issues in transformer models for machine learning practitioners, but it is incremental as it builds on existing linear transformer methods.

The paper tackles the computational complexity of attention mechanisms by extending a linear transformer approach to a second-order approximation of softmax normalization, resulting in improved efficiency.

Following up on the linear transformer part of the article from Katharopoulos et al., that takes this idea from Shen et al., the trick that produces a linear complexity for the attention mechanism is re-used and extended to a second-order approximation of the softmax normalization.

View on arXiv PDF

Similar