CEPE (Long-context / context-window extension): superseded — cited as a baseline and beaten by newer methods. 2 paper(s) critique it, 2 beat it on benchmarks — #15 of 53 most-superseded. Sub-problem: cluster led by Activation Beacon. Newer alternatives in the same sub-problem include SharedLLM, Gradual Forgetting.

Method Drift›Long-context / context-window extension

Superseded baseline#15 of 53 most-superseded

CEPE

Long-context / context-window extension

superseded — cited as a baseline and beaten by newer methods

2 papers critique it · 2 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites CEPE as a baseline.

“CEPE can process past context chunks in parallel, but these chunks must be passed through all its encoder layers (24-layer RoBERTa in CEPE) and layer-wise linear projections to obtain the final hidden states for cross-attention, leading to even slower inference speed than non-parallel Activation Beacon.”
— Stacked from One: Multi-Scale Self-Injection for Context Window Extension
“However, this heterogeneous architecture necessitates meticulous task design for the extra pretraining and warmup stages to stabilize the fine-tuning process.”
— Two are better than one: Context window extension with multi-grained self-injection

Beaten on benchmarks

Head-to-head results where a newer method reports beating CEPE. Values are copied from the source paper's tables — verify against the cited paper.

SharedLLM beats CEPE · Perplexity [Arxiv 4K]
2.99 vs 3.03
Stacked from One: Multi-Scale Self-Injection for Context Window Extension
SharedLLM beats CEPE · Perplexity [Arxiv 8K]
2.97 vs 3.02
Stacked from One: Multi-Scale Self-Injection for Context Window Extension
SharedLLM beats CEPE · Perplexity [Arxiv 32K]
2.46 vs 2.51
Stacked from One: Multi-Scale Self-Injection for Context Window Extension
SharedLLM beats CEPE · Perplexity [Arxiv 128K]
2.91 vs 2.97
Stacked from One: Multi-Scale Self-Injection for Context Window Extension
SharedLLM beats CEPE · Perplexity [PG19 4K]
6.55 vs 6.69
Stacked from One: Multi-Scale Self-Injection for Context Window Extension
SharedLLM beats CEPE · Perplexity [PG19 8K]
6.28 vs 6.40
Stacked from One: Multi-Scale Self-Injection for Context Window Extension
SharedLLM beats CEPE · Perplexity [PG19 32K]
6.65 vs 6.80
Stacked from One: Multi-Scale Self-Injection for Context Window Extension
SharedLLM beats CEPE · Perplexity [PG19 128K]
5.96 vs 6.10
Stacked from One: Multi-Scale Self-Injection for Context Window Extension
SharedLLM beats CEPE · Perplexity [ProofPile 4K]
2.33 vs 2.38
Stacked from One: Multi-Scale Self-Injection for Context Window Extension
SharedLLM beats CEPE · Perplexity [ProofPile 8K]
2.34 vs 2.43
Stacked from One: Multi-Scale Self-Injection for Context Window Extension
SharedLLM beats CEPE · Perplexity [ProofPile 32K]
2.38 vs 2.45
Stacked from One: Multi-Scale Self-Injection for Context Window Extension

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.