Method Drift›Long-context / context-window extension
NTK-aware
Long-context / context-window extension
superseded — cited as a baseline and beaten by newer methods
4 papers critique it · 6 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites NTK-aware as a baseline.
“rescaling factors derived from previous methods often fall short of achieving the effective target context length.”
— LongRoPE2: Near-Lossless LLM Context Window Scaling“PE and NTK experience repetition issues, resulting in lower NoRepeat Score.”
— RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers“However, these static approaches do not account for the distinctive spectral progression of the diffusion process, where low-frequency structures are generated in the first sampling steps, while high-frequency details are resolved later”
— DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion“methods like NTK, Dyn-NTK, and YaRN suffer from attention logit outliers due to their positional embedding interpolations”
— A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation (GALI)
Beaten on benchmarks
Head-to-head results where a newer method reports beating NTK-aware. Values are copied from the source paper's tables — verify against the cited paper.
- DoPE: Denoising Rotary Position Embedding
DoPE-by-Gaussian beats NTK-aware · Original (24k) [24k tokens]
94.938 vs 91.896
- DoPE: Denoising Rotary Position Embedding
DoPE-by-Gaussian beats NTK-aware · Noisy (24k) [24k tokens]
84.354 vs 75.417
- DoPE: Denoising Rotary Position Embedding
DoPE-by-Gaussian beats NTK-aware · Original (64k) [64k tokens]
70.083 vs 60.938
- DoPE: Denoising Rotary Position Embedding
DoPE-by-Gaussian beats NTK-aware · Noisy (64k) [64k tokens]
45.667 vs 40.417
- LongRoPE2: Near-Lossless LLM Context Window Scaling
RULER beats NTK-aware · RULER average at 128k [Base Model: Phi3-mini (3.8B)]
58.81 vs 49.37
- LongRoPE2: Near-Lossless LLM Context Window Scaling
RULER beats NTK-aware · RULER average at 128k [Base Model: LLaMA3-8B]
82.03 vs 73.19
- LongRoPE2: Near-Lossless LLM Context Window Scaling
RULER beats NTK-aware · Average [Base Model: Phi3-mini (3.8B) with 128k context window]
61.7 vs 57.3
- LongRoPE2: Near-Lossless LLM Context Window Scaling
RULER beats NTK-aware · Average [Base Model: LLaMA3-8B with 128k context window]
55.7 vs 54.0
- LongRoPE2: Near-Lossless LLM Context Window Scaling
RULER beats NTK-aware · LOFT Avg. [Base model: Phi3-mini (3.8B)]
23.00 vs 7.57
- LongRoPE2: Near-Lossless LLM Context Window Scaling
RULER beats NTK-aware · InfiniteBench - LongBench Avg. [Base model: Phi3-mini (3.8B)]
55.23 vs 52.31
- LongRoPE2: Near-Lossless LLM Context Window Scaling
RULER beats NTK-aware · LOFT Avg. [Base model: LLaMA3-8B]
74.28 vs 67.14
- LongRoPE2: Near-Lossless LLM Context Window Scaling
RULER beats NTK-aware · InfiniteBench - LongBench Avg. [Base model: LLaMA3-8B]
73.37 vs 67.98
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.