LGMay 24, 2024

Spectraformer: A Unified Random Feature Framework for Transformer

arXiv:2405.15310v54 citationsh-index: 2Has CodeACM Trans Intell Syst Technol
Originality Incremental advance
AI Analysis

This work addresses the computational bottleneck in Transformers for long sequences, offering a systematic approach to improve efficiency, though it is incremental as it builds on existing random feature methods.

The authors tackled the problem of efficiently approximating the attention mechanism in Transformers by introducing Spectraformer, a unified random feature framework that achieves performance comparable to top sparse and low-rank methods on the Long Range Arena benchmark, establishing a new state-of-the-art for random feature-based efficient Transformers.

Linearization of attention using various kernel approximation and kernel learning techniques has shown promise. Past methods used a subset of combinations of component functions and weight matrices within the random feature paradigm. We identify the need for a systematic comparison of different combinations of weight matrices and component functions for attention learning in Transformer. Hence, we introduce Spectraformer, a unified framework for approximating and learning the kernel function in the attention mechanism of the Transformer. Our empirical results demonstrate, for the first time, that a random feature-based approach can achieve performance comparable to top-performing sparse and low-rank methods on the challenging Long Range Arena benchmark. Thus, we establish a new state-of-the-art for random feature-based efficient Transformers. The framework also produces many variants that offer different advantages in accuracy, training time, and memory consumption. Our code is available at: https://github.com/cruiseresearchgroup/spectraformer .

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes