Method Drift›Long-context / context-window extension
Superseded baseline#24 of 53 most-superseded
FIRE
Long-context / context-window extension
superseded — cited as a baseline and beaten by newer methods
2 papers critique it · 1 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites FIRE as a baseline.
“Although FIRE utilizes MLPs to learn positional embeddings, these embeddings remain fixed across different tasks once the training is completed.”
— DAPE: Data-Adaptive Positional Encoding for Length Extrapolation“However, as our experiments demonstrate, this behavior was not beneficial in our settings, leading to inferior performance compared to ALiBi and Kerple.”
— Context-aware Biases for Length Extrapolation
Beaten on benchmarks
Head-to-head results where a newer method reports beating FIRE. Values are copied from the source paper's tables — verify against the cited paper.
- DAPE: Data-Adaptive Positional Encoding for Length Extrapolation
DAPE-Kerple beats FIRE · perplexity (mean) [training_length_512_eval_8192]
3.8642 vs 308.6173
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- Mask Prior Suppression and Monotonic RoPE ScalingMitigating Mask Prior Drift and Positional Attention Collapse in Large Diffusion Vision-Language ModelsMay 14, 2026
- Apr 1, 2026
- C^2RoPEC^2ROPE: Causal Continuous Rotary Positional Encoding for 3D Large Multimodal-Models ReasoningFeb 11, 2026
- Imaginary Extension of Rotary Position EmbeddingsBeyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMsDec 8, 2025
- Nov 21, 2025