LGCLCVApr 9, 2025

CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers

arXiv:2504.06704v1
Originality Highly original
AI Analysis

This addresses scalability issues for longer sequences in Transformers, which is crucial for applications in natural language processing and computer vision, representing an incremental improvement with practical efficiency gains.

The paper tackles the O(N^2) complexity bottleneck in standard Transformer attention mechanisms by introducing Circular-convolutional Attention (CAT), a Fourier-based method that reduces complexity to O(NlogN) while achieving consistent accuracy improvements and about a 10% speedup on benchmarks like ImageNet-1k and WikiText-103.

Transformers have driven remarkable breakthroughs in natural language processing and computer vision, yet their standard attention mechanism still imposes O(N^2) complexity, hindering scalability to longer sequences. We introduce Circular-convolutional ATtention (CAT), a Fourier-based approach that efficiently applies circular convolutions to reduce complexity without sacrificing representational power. CAT achieves O(NlogN) computations, requires fewer learnable parameters by streamlining fully-connected layers, and introduces no heavier operations, resulting in consistent accuracy improvements and about a 10% speedup in naive PyTorch implementations on large-scale benchmarks such as ImageNet-1k and WikiText-103. Grounded in an engineering-isomorphism framework, CAT's design not only offers practical efficiency and ease of implementation but also provides insights to guide the development of next-generation, high-performance Transformer architectures. Finally, our ablation studies highlight the key conditions underlying CAT's success, shedding light on broader principles for scalable attention mechanisms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes