Method Drift›Long-context / context-window extension
DCA
Long-context / context-window extension
superseded — cited as a baseline and beaten by newer methods
2 papers critique it · 2 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites DCA as a baseline.
“The two training-free length extrapolation baselines, Dual Chunk Attention and Positional Interpolation, shown in Table~tab:qwen_math_extrapolation, achieve accuracies close to zero, demonstrating astonishingly poor performance.”
— DoPE: Denoising Rotary Position Embedding“Fixed group sizes are used for position mapping, regardless of varying input lengths. This lack of adaptability prevents the optimal utilization of well-trained short-range positions.”
— LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training
Beaten on benchmarks
Head-to-head results where a newer method reports beating DCA. Values are copied from the source paper's tables — verify against the cited paper.
- DoPE: Denoising Rotary Position Embedding
DoPE-by-Gaussian beats DCA · Needle Insert (8K) [8K context Many-shot]
0.393 vs 0
- LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training
LaMPE beats DCA · Avg. [Llama2-7B-Chat, 16K/25K setting]
35.07 vs 32.48
- LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training
LaMPE beats DCA · Avg. [Llama3-8B-Instruct, 16K/32K setting]
46.99 vs 44.70
- LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training
LaMPE beats DCA · Avg. [Llama3-8B-Instruct, 32K extrapolation]
27.02 vs 24.22
- LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training
LaMPE beats DCA · Avg. [Llama3-8B-Instruct, 64K extrapolation]
33.98 vs 31.03
- LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training
LaMPE beats DCA · Avg. [Llama3.1-8B-Instruct, 128K extrapolation]
62.85 vs 57.55
- LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training
LaMPE beats DCA · Avg. [Llama2-7B-Chat]
7.00 vs 7.23
- LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training
LaMPE beats DCA · Avg. [Llama3-8B-Instruct]
7.23 vs 7.43
- LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training
LaMPE beats DCA · Avg. [16K extrapolation]
87.32 vs 72.28
- LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training
LaMPE beats DCA · Avg. [24K extrapolation]
82.12 vs 67.53
- LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training
LaMPE beats DCA · Avg. [32K extrapolation]
79.12 vs 61.99
- LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training
LaMPE beats DCA · Avg. [40K extrapolation]
75.90 vs 56.45
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.