Method Drift›Long-context / context-window extension
LongRoPE
LongRoPE: Extending LLM Context Window Beyond 2 Million TokensLong-context / context-window extension · first seen Feb 21, 2024
superseded — cited as a baseline and beaten by newer methods
3 papers critique it · 1 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites LongRoPE as a baseline.
“traditional approaches chen2023extending often suffer from a significant performance drop chen2023clex, ding2024longrope at the target length due to their limited generalization capability.”
— DCIS: Efficient Length Extrapolation of LLMs via Divide-and-Conquer Scaling Factor Search“rescaling factors derived from previous methods often fall short of achieving the effective target context length.”
— LongRoPE2: Near-Lossless LLM Context Window Scaling“due to the exponential search space complexity, it is challenging for those methods to estimate an optimal frequency; they also need heavy searching cost, for instance, it costs LongRoPE nearly 3 days to search an optimal frequency for a 256k context window using an A100 GPU”
— PSC: Extending Context Window of Large Language Models via Phase Shift Calibration
Beaten on benchmarks
Head-to-head results where a newer method reports beating LongRoPE. Values are copied from the source paper's tables — verify against the cited paper.
- LongRoPE2: Near-Lossless LLM Context Window Scaling
RULER beats LongRoPE · RULER average at 128k [Base Model: Phi3-mini (3.8B)]
58.81 vs 53.71
- LongRoPE2: Near-Lossless LLM Context Window Scaling
RULER beats LongRoPE · RULER average at 128k [Base Model: LLaMA3-8B]
82.03 vs 73.40
- LongRoPE2: Near-Lossless LLM Context Window Scaling
RULER beats LongRoPE · Average [Base Model: Phi3-mini (3.8B) with 128k context window]
61.7 vs 58.5
- LongRoPE2: Near-Lossless LLM Context Window Scaling
RULER beats LongRoPE · Average [Base Model: LLaMA3-8B with 128k context window]
55.7 vs 54.6
- LongRoPE2: Near-Lossless LLM Context Window Scaling
RULER beats LongRoPE · LOFT Avg. [Base model: Phi3-mini (3.8B)]
23.00 vs 21.14
- LongRoPE2: Near-Lossless LLM Context Window Scaling
RULER beats LongRoPE · InfiniteBench - LongBench Avg. [Base model: Phi3-mini (3.8B)]
55.23 vs 50.67
- LongRoPE2: Near-Lossless LLM Context Window Scaling
RULER beats LongRoPE · LOFT Avg. [Base model: LLaMA3-8B]
74.28 vs 60.85
- LongRoPE2: Near-Lossless LLM Context Window Scaling
RULER beats LongRoPE · InfiniteBench - LongBench Avg. [Base model: LLaMA3-8B]
73.37 vs 70.39
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.