Living systematic review

Long-context / context-window extension

Extending the usable context window of LLMs — rotary-position scaling (RoPE/PI/NTK/YaRN/LongRoPE), length extrapolation, and long-context fine-tuning.

69 papers · 137 critique receipts · 595 benchmark results · updated Jun 18, 2026

Most-superseded baselines

Ranked by how many distinct papers critique or beat each method. These are the standard baselines newer work routinely measures against.

1
RoPE· RoPE
RoFormer: Enhanced Transformer with Rotary Position Embedding
17 papers critique it · 8 beat it on benchmarks
2
YaRN· YaRN
YaRN: Efficient Context Window Extension of Large Language Models
10 papers critique it · 8 beat it on benchmarks
3
Position Interpolation· YaRN
Extending Context Window of Large Language Models via Positional Interpolation
8 papers critique it · 5 beat it on benchmarks
4
APE· RoPE
APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding
8 papers critique it · 3 beat it on benchmarks
5
StreamingLLM· StreamingLLM
Efficient Streaming Language Models with Attention Sinks
3 papers critique it · 7 beat it on benchmarks
6
NTK-aware· YaRN
4 papers critique it · 6 beat it on benchmarks
7
ALiBi· RoPE
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
4 papers critique it · 5 beat it on benchmarks
8
Self-Extend· YaRN
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
5 papers critique it · 4 beat it on benchmarks
9
H2O· StreamingLLM
H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
5 papers critique it · 3 beat it on benchmarks
10
Activation Beacon· Activation Beacon
Long Context Compression with Activation Beacon
2 papers critique it · 4 beat it on benchmarks
11
MInference· StreamingLLM
1 papers critique it · 4 beat it on benchmarks
12
KERPLE· RoPE
KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation
2 papers critique it · 3 beat it on benchmarks

Sub-problems

Methods that compete on the same benchmarks cluster into distinct sub-problems.

YaRN · 24 methods

YaRN · Position Interpolation · NTK-aware · Self-Extend · DCA · LongRoPE

RoPE · 18 methods

RoPE · APE · ALiBi · KERPLE · FIRE · HoPE

StreamingLLM · 24 methods

StreamingLLM · H2O · MInference · SnapKV · Quest · PyramidKV

Activation Beacon · 12 methods

Activation Beacon · CEPE · Mamba · HMT · LongLLMLingua · RMT

LongLoRA · 13 methods

LongLoRA · ABF · LoCoCo · D2O · FastKV · ThinK

RAG · 6 methods

RAG · CoA · LongAgent · Chain-of-Agents (CoA) · Graph of Agents (GoA) · XpandA

Ring Attention · 4 methods

Ring Attention · BigBird · Longformer · ΠAttention

The frontier

Recent methods not yet superseded in the knowledge base.