Mamba (Long-context / context-window extension): superseded — cited as a baseline and beaten by newer methods. 3 paper(s) critique it, 1 beat it on benchmarks — #20 of 53 most-superseded. Sub-problem: cluster led by Activation Beacon. Newer alternatives in the same sub-problem include SharedLLM, Gradual Forgetting.

Method Drift›Long-context / context-window extension

Superseded baseline#20 of 53 most-superseded

Mamba

Long-context / context-window extension

superseded — cited as a baseline and beaten by newer methods

3 papers critique it · 1 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites Mamba as a baseline.

“Mamba and RWKV maintain compact fixed-size states but struggle with fine-grained retrieval at long range”
— The Impossibility Triangle of Long-Context Modeling
“we show, through a series of visualizations, analyses, and empirical measures, that the main barrier is Mamba's implicit bias towards sequence lengths that were seen during training, a phenomenon that we call `limited effective receptive field' (ERF).”
— DeciMamba: Exploring the Length Extrapolation Potential of Mamba
“these recurrent sequence methods are specifically designed as architectural alternatives to Transformers and cannot be directly applied to existing pre-trained LLMs in a plug-and-play manner, requiring models to be trained from scratch and thus limiting their adoption in the current LLM ecosystem”
— CoMeT: Collaborative Memory Transformer for Efficient Long Context Modeling

Beaten on benchmarks

Head-to-head results where a newer method reports beating Mamba. Values are copied from the source paper's tables — verify against the cited paper.

DeciMamba beats Mamba · LongBench Score [instruction-tuned Mamba-2.8b zero-shot]
12.61 vs 3.93
DeciMamba: Exploring the Length Extrapolation Potential of Mamba
DeciMamba beats Mamba · F1 [TriviaQA 4-8k context]
14.02 vs 5.59
DeciMamba: Exploring the Length Extrapolation Potential of Mamba
+Deci beats Mamba · Accuracy [Retrieval with varying document count]
5.3 vs 0.3
DeciMamba: Exploring the Length Extrapolation Potential of Mamba

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.