Is Activation Beacon superseded?

Activation Beacon (Long-context / context-window extension): superseded — cited as a baseline and beaten by newer methods. 2 paper(s) critique it, 4 beat it on benchmarks — #10 of 53 most-superseded. Sub-problem: cluster led by Activation Beacon. Newer alternatives in the same sub-problem include SharedLLM, Gradual Forgetting.

Method Drift›Long-context / context-window extension

Superseded baseline#10 of 53 most-superseded

Activation Beacon

Long Context Compression with Activation Beacon

Long-context / context-window extension · first seen Jan 7, 2024

superseded — cited as a baseline and beaten by newer methods

2 papers critique it · 4 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites Activation Beacon as a baseline.

“However, they need a copy of multi-head attention, which amounts to approximately 2B for 7B models.”
— FreqKV: Frequency Domain Key-Value Compression for Efficient Context Window Extension
“the compressed length still grows linearly with the original context length. This fails to fundamentally alter the asymptotic order of spatiotemporal complexity and can only improve efficiency by reducing constant factors”
— CoMeT: Collaborative Memory Transformer for Efficient Long Context Modeling

Beaten on benchmarks

Head-to-head results where a newer method reports beating Activation Beacon. Values are copied from the source paper's tables — verify against the cited paper.

SharedLLM beats Activation Beacon · Perplexity [LLaMA-2 4K]
8.68 vs 9.21
Stacked from One: Multi-Scale Self-Injection for Context Window Extension
SharedLLM beats Activation Beacon · Perplexity [LLaMA-2 16K]
8.01 vs 8.34
Stacked from One: Multi-Scale Self-Injection for Context Window Extension
SharedLLM beats Activation Beacon · Perplexity [LLaMA-2 32K]
7.96 vs 8.27
Stacked from One: Multi-Scale Self-Injection for Context Window Extension
SharedLLM beats Activation Beacon · Perplexity [LLaMA-2 100K]
8.24 vs 8.50
Stacked from One: Multi-Scale Self-Injection for Context Window Extension
SharedLLM beats Activation Beacon · Accuracy [LLaMA-2 SDQA]
28.83 vs 28.27
Stacked from One: Multi-Scale Self-Injection for Context Window Extension
SharedLLM beats Activation Beacon · Accuracy [LLaMA-2 MDQA]
30.93 vs 28.44
Stacked from One: Multi-Scale Self-Injection for Context Window Extension
SharedLLM beats Activation Beacon · Accuracy [LLaMA-2 Code]
59.93 vs 57.75
Stacked from One: Multi-Scale Self-Injection for Context Window Extension
SharedLLM beats Activation Beacon · Accuracy [Mistral-7B SDQA]
30.75 vs 29.89
Stacked from One: Multi-Scale Self-Injection for Context Window Extension
3D-RPE-LLaMA2-7B beats Activation Beacon · Perplexity [PG-19 at 8k context]
7.03 vs 8.52
3D-RPE: Enhancing Long-Context Modeling Through 3D Rotary Position Encoding
3D-RPE-LLaMA2-7B beats Activation Beacon · Perplexity [PG-19 at 16k context]
7.10 vs 8.54
3D-RPE: Enhancing Long-Context Modeling Through 3D Rotary Position Encoding
3D-RPE-LLaMA2-7B beats Activation Beacon · Perplexity [Proof-Pile at 8k context]
2.72 vs 3.45
3D-RPE: Enhancing Long-Context Modeling Through 3D Rotary Position Encoding
3D-RPE-LLaMA2-7B beats Activation Beacon · Perplexity [Proof-Pile at 16k context]
2.93 vs 3.42
3D-RPE: Enhancing Long-Context Modeling Through 3D Rotary Position Encoding

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.