Is LongRoPE superseded?

LongRoPE (Long-context / context-window extension): superseded — cited as a baseline and beaten by newer methods. 3 paper(s) critique it, 1 beat it on benchmarks — #19 of 53 most-superseded. Sub-problem: cluster led by YaRN. Newer alternatives in the same sub-problem include Cross-Resolution Phase-Aligned Attention (CRPA), DoPE, DyPE.

Method Drift›Long-context / context-window extension

Superseded baseline#19 of 53 most-superseded

LongRoPE

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Long-context / context-window extension · first seen Feb 21, 2024

superseded — cited as a baseline and beaten by newer methods

3 papers critique it · 1 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites LongRoPE as a baseline.

“traditional approaches chen2023extending often suffer from a significant performance drop chen2023clex, ding2024longrope at the target length due to their limited generalization capability.”
— DCIS: Efficient Length Extrapolation of LLMs via Divide-and-Conquer Scaling Factor Search
“rescaling factors derived from previous methods often fall short of achieving the effective target context length.”
— LongRoPE2: Near-Lossless LLM Context Window Scaling
“due to the exponential search space complexity, it is challenging for those methods to estimate an optimal frequency; they also need heavy searching cost, for instance, it costs LongRoPE nearly 3 days to search an optimal frequency for a 256k context window using an A100 GPU”
— PSC: Extending Context Window of Large Language Models via Phase Shift Calibration

Beaten on benchmarks

Head-to-head results where a newer method reports beating LongRoPE. Values are copied from the source paper's tables — verify against the cited paper.

RULER beats LongRoPE · RULER average at 128k [Base Model: Phi3-mini (3.8B)]
58.81 vs 53.71
LongRoPE2: Near-Lossless LLM Context Window Scaling
RULER beats LongRoPE · RULER average at 128k [Base Model: LLaMA3-8B]
82.03 vs 73.40
LongRoPE2: Near-Lossless LLM Context Window Scaling
RULER beats LongRoPE · Average [Base Model: Phi3-mini (3.8B) with 128k context window]
61.7 vs 58.5
LongRoPE2: Near-Lossless LLM Context Window Scaling
RULER beats LongRoPE · Average [Base Model: LLaMA3-8B with 128k context window]
55.7 vs 54.6
LongRoPE2: Near-Lossless LLM Context Window Scaling
RULER beats LongRoPE · LOFT Avg. [Base model: Phi3-mini (3.8B)]
23.00 vs 21.14
LongRoPE2: Near-Lossless LLM Context Window Scaling
RULER beats LongRoPE · InfiniteBench - LongBench Avg. [Base model: Phi3-mini (3.8B)]
55.23 vs 50.67
LongRoPE2: Near-Lossless LLM Context Window Scaling
RULER beats LongRoPE · LOFT Avg. [Base model: LLaMA3-8B]
74.28 vs 60.85
LongRoPE2: Near-Lossless LLM Context Window Scaling
RULER beats LongRoPE · InfiniteBench - LongBench Avg. [Base model: LLaMA3-8B]
73.37 vs 70.39
LongRoPE2: Near-Lossless LLM Context Window Scaling

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.