Method Drift›Long-context / context-window extension
Superseded baseline#39 of 53 most-superseded
BigBird
Big Bird: Transformers for Longer SequencesLong-context / context-window extension · first seen Jul 28, 2020
superseded — cited as a baseline and beaten by newer methods
0 papers critique it · 1 beat it on benchmarks
Beaten on benchmarks
Head-to-head results where a newer method reports beating BigBird. Values are copied from the source paper's tables — verify against the cited paper.
- $π$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling
PiAttention (Ours) beats BigBird · Training Time [WikiText-103]
12.4 vs 15.3
- $π$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling
PiAttention (Ours) beats BigBird · Inference Time [WikiText-103]
36.7 vs 42.8
- $π$-Attention: Periodic Sparse Transformers for Efficient Long-Context Modeling
PiAttention (Ours) beats BigBird · MFU [WikiText-103]
55.4 vs 48.7