LG CLMar 9, 2021

Beyond Nyströmformer -- Approximation of self-attention by Spectral Shifting

arXiv:2103.05638v12 citations

Originality Incremental advance

AI Analysis

This is an incremental improvement for efficient Transformer models in natural language processing.

The paper tackles the quadratic time complexity bottleneck of self-attention in Transformers by proposing an alternative approximation method to Nyströmformer, achieving a stronger error bound while maintaining linear time complexity O(n).

Transformer is a powerful tool for many natural language tasks which is based on self-attention, a mechanism that encodes the dependence of other tokens on each specific token, but the computation of self-attention is a bottleneck due to its quadratic time complexity. There are various approaches to reduce the time complexity and approximation of matrix is one such. In Nyströmformer, the authors used Nyström based method for approximation of softmax. The Nyström method generates a fast approximation to any large-scale symmetric positive semidefinite (SPSD) matrix using only a few columns of the SPSD matrix. However, since the Nyström approximation is low-rank when the spectrum of the SPSD matrix decays slowly, the Nyström approximation is of low accuracy. Here an alternative method is proposed for approximation which has a much stronger error bound than the Nyström method. The time complexity of this same as Nyströmformer which is $O\left({n}\right)$.

View on arXiv PDF

Similar