Method Drift›Long-context / context-window extension
LongLoRA
LongLoRA: Efficient Fine-tuning of Long-Context Large Language ModelsLong-context / context-window extension · first seen Sep 21, 2023
superseded — cited as a baseline and beaten by newer methods
2 papers critique it · 3 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites LongLoRA as a baseline.
“However, this sparse attention mechanism is not applicable during inference, necessitating a return to the original full attention post-training.”
— FreqKV: Frequency Domain Key-Value Compression for Efficient Context Window Extension“these methods typically require finetuning to achieve extension, which can be resource and time-intensive given the quadratic complexity of Transformers”
— LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Beaten on benchmarks
Head-to-head results where a newer method reports beating LongLoRA. Values are copied from the source paper's tables — verify against the cited paper.
- FreqKV: Frequency Domain Key-Value Compression for Efficient Context Window Extension
FreqKV beats LongLoRA · Perplexity [7B, 8K training]
7.45 vs 7.70
- FreqKV: Frequency Domain Key-Value Compression for Efficient Context Window Extension
FreqKV beats LongLoRA · Perplexity [7B, 16K training]
7.46 vs 7.65
- FreqKV: Frequency Domain Key-Value Compression for Efficient Context Window Extension
FreqKV beats LongLoRA · Perplexity [7B, 32K training]
7.47 vs 8.29
- EndPrompt: Efficient Long-Context Extension via Terminal Anchoring
ET beats LongLoRA · Avg. [RULER benchmark]
76.03 vs 72.95
- Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models
HARPE (ours) beats LongLoRA · Average RULER score [RULER tasks across context lengths]
70.40 vs 60.63
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.
- May 14, 2026