LGCLOct 15, 2021

On Learning the Transformer Kernel

arXiv:2110.08323v217 citations
Originality Incremental advance
AI Analysis

This work addresses computational bottlenecks in Transformers for machine learning practitioners, offering a scalable alternative with linear complexity, though it is incremental as it builds on prior efficient Transformer methods.

The authors tackled the problem of quadratic complexity in Transformers by introducing KERNELIZED TRANSFORMER, a framework that learns the kernel function via spectral feature maps, reducing time and space complexity to linear while achieving performance comparable to existing efficient architectures.

In this work we introduce KERNELIZED TRANSFORMER, a generic, scalable, data driven framework for learning the kernel function in Transformers. Our framework approximates the Transformer kernel as a dot product between spectral feature maps and learns the kernel by learning the spectral distribution. This not only helps in learning a generic kernel end-to-end, but also reduces the time and space complexity of Transformers from quadratic to linear. We show that KERNELIZED TRANSFORMERS achieve performance comparable to existing efficient Transformer architectures, both in terms of accuracy as well as computational efficiency. Our study also demonstrates that the choice of the kernel has a substantial impact on performance, and kernel learning variants are competitive alternatives to fixed kernel Transformers, both in long as well as short sequence tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes