DCA (Long-context / context-window extension): superseded — cited as a baseline and beaten by newer methods. 2 paper(s) critique it, 2 beat it on benchmarks — #16 of 53 most-superseded. Sub-problem: cluster led by YaRN. Newer alternatives in the same sub-problem include Cross-Resolution Phase-Aligned Attention (CRPA), DoPE, DyPE.

Method Drift›Long-context / context-window extension

Superseded baseline#16 of 53 most-superseded

DCA

Long-context / context-window extension

superseded — cited as a baseline and beaten by newer methods

2 papers critique it · 2 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites DCA as a baseline.

“The two training-free length extrapolation baselines, Dual Chunk Attention and Positional Interpolation, shown in Table~tab:qwen_math_extrapolation, achieve accuracies close to zero, demonstrating astonishingly poor performance.”
— DoPE: Denoising Rotary Position Embedding
“Fixed group sizes are used for position mapping, regardless of varying input lengths. This lack of adaptability prevents the optimal utilization of well-trained short-range positions.”
— LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training

Beaten on benchmarks

Head-to-head results where a newer method reports beating DCA. Values are copied from the source paper's tables — verify against the cited paper.

DoPE-by-Gaussian beats DCA · Needle Insert (8K) [8K context Many-shot]
0.393 vs 0
DoPE: Denoising Rotary Position Embedding
LaMPE beats DCA · Avg. [Llama2-7B-Chat, 16K/25K setting]
35.07 vs 32.48
LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training
LaMPE beats DCA · Avg. [Llama3-8B-Instruct, 16K/32K setting]
46.99 vs 44.70
LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training
LaMPE beats DCA · Avg. [Llama3-8B-Instruct, 32K extrapolation]
27.02 vs 24.22
LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training
LaMPE beats DCA · Avg. [Llama3-8B-Instruct, 64K extrapolation]
33.98 vs 31.03
LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training
LaMPE beats DCA · Avg. [Llama3.1-8B-Instruct, 128K extrapolation]
62.85 vs 57.55
LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training
LaMPE beats DCA · Avg. [Llama2-7B-Chat]
7.00 vs 7.23
LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training
LaMPE beats DCA · Avg. [Llama3-8B-Instruct]
7.23 vs 7.43
LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training
LaMPE beats DCA · Avg. [16K extrapolation]
87.32 vs 72.28
LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training
LaMPE beats DCA · Avg. [24K extrapolation]
82.12 vs 67.53
LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training
LaMPE beats DCA · Avg. [32K extrapolation]
79.12 vs 61.99
LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training
LaMPE beats DCA · Avg. [40K extrapolation]
75.90 vs 56.45
LaMPE: Length-aware Multi-grained Positional Encoding for Adaptive Long-context Scaling Without Training

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.