CLOct 6, 2025

The End of Transformers? On Challenging Attention and the Rise of Sub-Quadratic Architectures

arXiv:2510.05364v12 citationsh-index: 4
Originality Synthesis-oriented
AI Analysis

This addresses the computational inefficiency of transformers for long contexts, which is a problem for researchers and practitioners in AI, but it is incremental as it reviews existing approaches.

The paper surveys recent efforts to overcome the quadratic complexity bottleneck of transformers in sequence processing, analyzing sub-quadratic architectures like attention variants and state space models to assess if they can challenge transformer dominance.

Transformers have dominated sequence processing tasks for the past seven years -- most notably language modeling. However, the inherent quadratic complexity of their attention mechanism remains a significant bottleneck as context length increases. This paper surveys recent efforts to overcome this bottleneck, including advances in (sub-quadratic) attention variants, recurrent neural networks, state space models, and hybrid architectures. We critically analyze these approaches in terms of compute and memory complexity, benchmark results, and fundamental limitations to assess whether the dominance of pure-attention transformers may soon be challenged.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes