Method Drift›Long-context / context-window extension
Activation Beacon
Long Context Compression with Activation BeaconLong-context / context-window extension · first seen Jan 7, 2024
superseded — cited as a baseline and beaten by newer methods
2 papers critique it · 4 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites Activation Beacon as a baseline.
“However, they need a copy of multi-head attention, which amounts to approximately 2B for 7B models.”
— FreqKV: Frequency Domain Key-Value Compression for Efficient Context Window Extension“the compressed length still grows linearly with the original context length. This fails to fundamentally alter the asymptotic order of spatiotemporal complexity and can only improve efficiency by reducing constant factors”
— CoMeT: Collaborative Memory Transformer for Efficient Long Context Modeling
Beaten on benchmarks
Head-to-head results where a newer method reports beating Activation Beacon. Values are copied from the source paper's tables — verify against the cited paper.
- Stacked from One: Multi-Scale Self-Injection for Context Window Extension
SharedLLM beats Activation Beacon · Perplexity [LLaMA-2 4K]
8.68 vs 9.21
- Stacked from One: Multi-Scale Self-Injection for Context Window Extension
SharedLLM beats Activation Beacon · Perplexity [LLaMA-2 16K]
8.01 vs 8.34
- Stacked from One: Multi-Scale Self-Injection for Context Window Extension
SharedLLM beats Activation Beacon · Perplexity [LLaMA-2 32K]
7.96 vs 8.27
- Stacked from One: Multi-Scale Self-Injection for Context Window Extension
SharedLLM beats Activation Beacon · Perplexity [LLaMA-2 100K]
8.24 vs 8.50
- Stacked from One: Multi-Scale Self-Injection for Context Window Extension
SharedLLM beats Activation Beacon · Accuracy [LLaMA-2 SDQA]
28.83 vs 28.27
- Stacked from One: Multi-Scale Self-Injection for Context Window Extension
SharedLLM beats Activation Beacon · Accuracy [LLaMA-2 MDQA]
30.93 vs 28.44
- Stacked from One: Multi-Scale Self-Injection for Context Window Extension
SharedLLM beats Activation Beacon · Accuracy [LLaMA-2 Code]
59.93 vs 57.75
- Stacked from One: Multi-Scale Self-Injection for Context Window Extension
SharedLLM beats Activation Beacon · Accuracy [Mistral-7B SDQA]
30.75 vs 29.89
- 3D-RPE: Enhancing Long-Context Modeling Through 3D Rotary Position Encoding
3D-RPE-LLaMA2-7B beats Activation Beacon · Perplexity [PG-19 at 8k context]
7.03 vs 8.52
- 3D-RPE: Enhancing Long-Context Modeling Through 3D Rotary Position Encoding
3D-RPE-LLaMA2-7B beats Activation Beacon · Perplexity [PG-19 at 16k context]
7.10 vs 8.54
- 3D-RPE: Enhancing Long-Context Modeling Through 3D Rotary Position Encoding
3D-RPE-LLaMA2-7B beats Activation Beacon · Perplexity [Proof-Pile at 8k context]
2.72 vs 3.45
- 3D-RPE: Enhancing Long-Context Modeling Through 3D Rotary Position Encoding
3D-RPE-LLaMA2-7B beats Activation Beacon · Perplexity [Proof-Pile at 16k context]
2.93 vs 3.42
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.