Living systematic review
Long-context / context-window extension
Extending the usable context window of LLMs — rotary-position scaling (RoPE/PI/NTK/YaRN/LongRoPE), length extrapolation, and long-context fine-tuning.
69 papers · 137 critique receipts · 595 benchmark results · updated Jun 18, 2026
Most-superseded baselines
Ranked by how many distinct papers critique or beat each method. These are the standard baselines newer work routinely measures against.
- 1RoPE· RoPERoFormer: Enhanced Transformer with Rotary Position Embedding
17 papers critique it · 8 beat it on benchmarks
- 2YaRN· YaRNYaRN: Efficient Context Window Extension of Large Language Models
10 papers critique it · 8 beat it on benchmarks
- 3Position Interpolation· YaRNExtending Context Window of Large Language Models via Positional Interpolation
8 papers critique it · 5 beat it on benchmarks
- 4APE· RoPEAPE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding
8 papers critique it · 3 beat it on benchmarks
- 5StreamingLLM· StreamingLLMEfficient Streaming Language Models with Attention Sinks
3 papers critique it · 7 beat it on benchmarks
- 7ALiBi· RoPETrain Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
4 papers critique it · 5 beat it on benchmarks
- 8Self-Extend· YaRNLLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
5 papers critique it · 4 beat it on benchmarks
- 9H2O· StreamingLLMH$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
5 papers critique it · 3 beat it on benchmarks
- 10Activation Beacon· Activation BeaconLong Context Compression with Activation Beacon
2 papers critique it · 4 beat it on benchmarks
- 12KERPLE· RoPEKERPLE: Kernelized Relative Positional Embedding for Length Extrapolation
2 papers critique it · 3 beat it on benchmarks
Sub-problems
Methods that compete on the same benchmarks cluster into distinct sub-problems.
YaRN · 24 methods
YaRN · Position Interpolation · NTK-aware · Self-Extend · DCA · LongRoPE
StreamingLLM · 24 methods
StreamingLLM · H2O · MInference · SnapKV · Quest · PyramidKV
Activation Beacon · 12 methods
Activation Beacon · CEPE · Mamba · HMT · LongLLMLingua · RMT
RAG · 6 methods
RAG · CoA · LongAgent · Chain-of-Agents (CoA) · Graph of Agents (GoA) · XpandA
Ring Attention · 4 methods
The frontier
Recent methods not yet superseded in the knowledge base.
- BA-AttEfficient Long-Context Modeling in Diffusion Language Models via Block Approximate Sparse AttentionMay 19, 2026
- May 14, 2026
- Mask Prior Suppression and Monotonic RoPE ScalingMitigating Mask Prior Drift and Positional Attention Collapse in Large Diffusion Vision-Language ModelsMay 14, 2026
- Apr 1, 2026
- Mar 30, 2026
- Mar 5, 2026
- C^2RoPEC^2ROPE: Causal Continuous Rotary Positional Encoding for 3D Large Multimodal-Models ReasoningFeb 11, 2026
- Dec 10, 2025
- Imaginary Extension of Rotary Position EmbeddingsBeyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMsDec 8, 2025
- Cross-Resolution Phase-Aligned Attention (CRPA)One Attention, One Scale: Phase-Aligned Rotary Positional Embeddings for Mixed-Resolution Diffusion TransformerNov 24, 2025
- Nov 21, 2025
- Nov 12, 2025