Method Drift›Long-context / context-window extension
Ring Attention
Long-context / context-window extension
superseded — cited as a baseline and beaten by newer methods
1 papers critique it · 1 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites Ring Attention as a baseline.
“While this approach achieves linear complexity O(nk), it suffers from two critical limitations: (1) limited receptive field growth that scales linearly with depth”
— $π$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling
Beaten on benchmarks
Head-to-head results where a newer method reports beating Ring Attention. Values are copied from the source paper's tables — verify against the cited paper.
- $π$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling
PiAttention (Ours) beats Ring Attention · Acc [ListOps]
67.9 vs 62.3
- $π$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling
PiAttention (Ours) beats Ring Attention · F1 [RetrievalQA]
84.5 vs 78.9
- $π$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling
PiAttention (Ours) beats Ring Attention · Acc [Pathfinder]
89.1 vs 85.2
- $π$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling
PiAttention (Ours) beats Ring Attention · R@1 [MSCOCO]
72.4 vs 68.3
- $π$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling
PiAttention (Ours) beats Ring Attention · R@5 [MSCOCO]
91.2 vs 88.9
- $π$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling
PiAttention (Ours) beats Ring Attention · R@10 [MSCOCO]
96.8 vs 95.1
- $π$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling
PiAttention (Ours) beats Ring Attention · R@1 [Flickr30K]
76.3 vs 72.1
- $π$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling
PiAttention (Ours) beats Ring Attention · R@5 [Flickr30K]
94.1 vs 91.8
- $π$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling
PiAttention (Ours) beats Ring Attention · R@10 [Flickr30K]
98.2 vs 96.9
- $π$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling
PiAttention (Ours) beats Ring Attention · Training Time [WikiText-103]
12.4 vs 14.6
- $π$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling
PiAttention (Ours) beats Ring Attention · Inference Time [WikiText-103]
36.7 vs 44.3
- $π$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling
PiAttention (Ours) beats Ring Attention · MFU [WikiText-103]
55.4 vs 51.7