Is Self-Extend superseded?

Self-Extend (Long-context / context-window extension): superseded — cited as a baseline and beaten by newer methods. 5 paper(s) critique it, 4 beat it on benchmarks — #8 of 53 most-superseded. Sub-problem: cluster led by YaRN. Newer alternatives in the same sub-problem include Cross-Resolution Phase-Aligned Attention (CRPA), DoPE, DyPE.

Method Drift›Long-context / context-window extension

Superseded baseline#8 of 53 most-superseded

Self-Extend

LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

Long-context / context-window extension · first seen Jan 2, 2024

superseded — cited as a baseline and beaten by newer methods

5 papers critique it · 4 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites Self-Extend as a baseline.

“approaches like ReRoPE rerope2023 and Self-Extend jin2024llm extend sequence lengths by compressing sequence indices, although they necessitate double attention computations, raising computational demands and limiting extrapolation potential.”
— DCIS: Efficient Length Extrapolation of LLMs via Divide-and-Conquer Scaling Factor Search
“However, they blindly manipulate the position embeddings equally on all RoPE dimensions without considering RoPE's rotational properties.”
— Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation
“SelfExtend and ChunkLlama inherently disrupt local positional relationships, compromising model performance”
— A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation (GALI)
“SelfExtend's results on Llama3 in LongBench show that it impairs performance within the original window, which also highlights the limitations of manually setting group size.”
— LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training
“While LongLM's method shows promising results on long-context tasks, we propose a more adaptive strategy grounded in the observation that, in natural language, the relevance of a token typically decreases with its distance from the current context.”
— SELF: Self-Extend the Context Length With Logistic Growth Function

Beaten on benchmarks

Head-to-head results where a newer method reports beating Self-Extend. Values are copied from the source paper's tables — verify against the cited paper.

\name beats Self-Extend · Avg. (13 tasks) [Llama3 (8B), 8K / 128K]
56.08 vs 48.52
Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation
\name beats Self-Extend · Avg. (13 tasks) [Mistral-v0.2 (7B), 32K / 128K]
67.13 vs 65.04
Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation
\name beats Self-Extend · Avg. (13 tasks) [Qwen2.5 (7B), 128K / 128K]
70.78 vs 61.15
Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation
\name beats Self-Extend · Avg. (13 tasks) [Llama3.1 (70B), 128K / 128K]
86.39 vs 82.23
Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation
TokenSelect beats Self-Extend · Avg. [Qwen2-7B]
49.08 vs 4.86
TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
TokenSelect beats Self-Extend · Avg. [Llama-3-8B]
43.90 vs 29.53
TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
TokenSelect beats Self-Extend · Avg. [Yi-1.5-6B]
36.77 vs 1.01
TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
TokenSelect beats Self-Extend · Average [Qwen2-7B]
43.64 vs 18.65
TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
TokenSelect beats Self-Extend · Average [Llama-3-8B]
44.04 vs 14.42
TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
TokenSelect beats Self-Extend · Average [Yi-1.5-6B]
36.02 vs 18.53
TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
HARPE(ours) beats Self-Extend · Average NiaH score [NiaH benchmark across context lengths]
86.82 vs 42.52
Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models
LaMPE beats Self-Extend · Avg. [Llama2-7B-Chat, 16K/25K setting]
35.07 vs 34.30
LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.