Method Drift›Long-context / context-window extension
Self-Extend
LLM Maybe LongLM: Self-Extend LLM Context Window Without TuningLong-context / context-window extension · first seen Jan 2, 2024
superseded — cited as a baseline and beaten by newer methods
5 papers critique it · 4 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites Self-Extend as a baseline.
“approaches like ReRoPE rerope2023 and Self-Extend jin2024llm extend sequence lengths by compressing sequence indices, although they necessitate double attention computations, raising computational demands and limiting extrapolation potential.”
— DCIS: Efficient Length Extrapolation of LLMs via Divide-and-Conquer Scaling Factor Search“However, they blindly manipulate the position embeddings equally on all RoPE dimensions without considering RoPE's rotational properties.”
— Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation“SelfExtend and ChunkLlama inherently disrupt local positional relationships, compromising model performance”
— A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation (GALI)“SelfExtend's results on Llama3 in LongBench show that it impairs performance within the original window, which also highlights the limitations of manually setting group size.”
— LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training“While LongLM's method shows promising results on long-context tasks, we propose a more adaptive strategy grounded in the observation that, in natural language, the relevance of a token typically decreases with its distance from the current context.”
— SELF: Self-Extend the Context Length With Logistic Growth Function
Beaten on benchmarks
Head-to-head results where a newer method reports beating Self-Extend. Values are copied from the source paper's tables — verify against the cited paper.
- Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation
\name beats Self-Extend · Avg. (13 tasks) [Llama3 (8B), 8K / 128K]
56.08 vs 48.52
- Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation
\name beats Self-Extend · Avg. (13 tasks) [Mistral-v0.2 (7B), 32K / 128K]
67.13 vs 65.04
- Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation
\name beats Self-Extend · Avg. (13 tasks) [Qwen2.5 (7B), 128K / 128K]
70.78 vs 61.15
- Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation
\name beats Self-Extend · Avg. (13 tasks) [Llama3.1 (70B), 128K / 128K]
86.39 vs 82.23
- TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
TokenSelect beats Self-Extend · Avg. [Qwen2-7B]
49.08 vs 4.86
- TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
TokenSelect beats Self-Extend · Avg. [Llama-3-8B]
43.90 vs 29.53
- TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
TokenSelect beats Self-Extend · Avg. [Yi-1.5-6B]
36.77 vs 1.01
- TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
TokenSelect beats Self-Extend · Average [Qwen2-7B]
43.64 vs 18.65
- TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
TokenSelect beats Self-Extend · Average [Llama-3-8B]
44.04 vs 14.42
- TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
TokenSelect beats Self-Extend · Average [Yi-1.5-6B]
36.02 vs 18.53
- Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models
HARPE(ours) beats Self-Extend · Average NiaH score [NiaH benchmark across context lengths]
86.82 vs 42.52
- LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training
LaMPE beats Self-Extend · Avg. [Llama2-7B-Chat, 16K/25K setting]
35.07 vs 34.30
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.