Method Drift›Long-context / context-window extension
KERPLE
KERPLE: Kernelized Relative Positional Embedding for Length ExtrapolationLong-context / context-window extension · first seen May 20, 2022
superseded — cited as a baseline and beaten by newer methods
2 papers critique it · 3 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites KERPLE as a baseline.
“the learned static positional encoding (such as Kerple and FIRE) is an average optimal solution across all training samples. Consequently, while they might be generally effective, they are inherently suboptimal for any specific instance.”
— DAPE: Data-Adaptive Positional Encoding for Length Extrapolation“However, this incorporation of additional trainable parameters results in diminished training velocities.”
— MEP: Multiple Kernel Learning Enhancing Relative Positional Encoding Length Extrapolation
Beaten on benchmarks
Head-to-head results where a newer method reports beating KERPLE. Values are copied from the source paper's tables — verify against the cited paper.
- DAPE: Data-Adaptive Positional Encoding for Length Extrapolation
DAPE-Kerple beats KERPLE · perplexity (mean) [training_length_512_eval_8192]
3.8642 vs 13.3524
- DAPE: Data-Adaptive Positional Encoding for Length Extrapolation
DAPE-Kerple beats KERPLE · perplexity (mean) [training_length_512_eval_2048]
4.0505 vs 5.4438
- Context-aware Biases for Length Extrapolation
CABLE beats KERPLE · Perplexity [GPT-2 Medium on FineWeb-Edu-10B, trained on T=1024]
15.41 vs 26.13
- MEP: Multiple Kernel Learning Enhancing Relative Positional Encoding Length Extrapolation
MEP beats KERPLE · Perplexity [OpenWebText2, parametric]
21.23 vs 21.27
- MEP: Multiple Kernel Learning Enhancing Relative Positional Encoding Length Extrapolation
MEP beats KERPLE · Perplexity [GitHub, parametric]
2.239 vs 2.242
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- Mask Prior Suppression and Monotonic RoPE ScalingMitigating Mask Prior Drift and Positional Attention Collapse in Large Diffusion Vision-Language ModelsMay 14, 2026
- Apr 1, 2026
- C^2RoPEC^2ROPE: Causal Continuous Rotary Positional Encoding for 3D Large Multimodal-Models ReasoningFeb 11, 2026
- Imaginary Extension of Rotary Position EmbeddingsBeyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMsDec 8, 2025
- Nov 21, 2025