LGAug 29, 2025

From TLinFormer to TConstFormer: The Leap to Constant-Time Transformer Attention: Achieving O(1) Computation and O(1) KV Cache during Autoregressive Inference

arXiv:2509.00202v1h-index: 1
Originality Highly original
AI Analysis

This addresses a critical bottleneck for efficient and robust streaming language model applications, representing a significant leap rather than an incremental improvement.

The paper tackles the problem of Transformer autoregressive inference suffering from linearly growing KV cache and O(N^2 d) computational complexity, which hinders processing ultra-long sequences, and introduces TConstFormer to achieve O(1) KV cache and amortized O(1) computation, demonstrating overwhelming advantages in speed, memory efficiency, and performance on long-text tasks.

Although the Transformer has become the cornerstone of modern AI, its autoregressive inference suffers from a linearly growing KV Cache and a computational complexity of O(N^2 d), severely hindering its ability to process ultra-long sequences. To overcome this limitation, this paper introduces the TConstFormer architecture, building upon our previous work, TLinFormer. TConstFormer employs an innovative periodic state update mechanism to achieve a truly constant-size O(1) KV Cache. The computational complexity of this mechanism is also O(1) in an amortized sense: it performs purely constant-time computations for $k-1$ consecutive steps (e.g., $k=256$) and executes a single linear-time global information synchronization only on the $k$-th step. Theoretical calculations and experimental results demonstrate that TConstFormer exhibits an overwhelming advantage over baseline models in terms of speed, memory efficiency, and overall performance on long-text inference tasks. This breakthrough paves the way for efficient and robust streaming language model applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes