Method Drift›Long-context / context-window extension
Superseded baseline#20 of 53 most-superseded
Mamba
Long-context / context-window extension
superseded — cited as a baseline and beaten by newer methods
3 papers critique it · 1 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites Mamba as a baseline.
“Mamba and RWKV maintain compact fixed-size states but struggle with fine-grained retrieval at long range”
— The Impossibility Triangle of Long-Context Modeling“we show, through a series of visualizations, analyses, and empirical measures, that the main barrier is Mamba's implicit bias towards sequence lengths that were seen during training, a phenomenon that we call `limited effective receptive field' (ERF).”
— DeciMamba: Exploring the Length Extrapolation Potential of Mamba“these recurrent sequence methods are specifically designed as architectural alternatives to Transformers and cannot be directly applied to existing pre-trained LLMs in a plug-and-play manner, requiring models to be trained from scratch and thus limiting their adoption in the current LLM ecosystem”
— CoMeT: Collaborative Memory Transformer for Efficient Long Context Modeling
Beaten on benchmarks
Head-to-head results where a newer method reports beating Mamba. Values are copied from the source paper's tables — verify against the cited paper.
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba
DeciMamba beats Mamba · LongBench Score [instruction-tuned Mamba-2.8b zero-shot]
12.61 vs 3.93
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba
DeciMamba beats Mamba · F1 [TriviaQA 4-8k context]
14.02 vs 5.59
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba
+Deci beats Mamba · Accuracy [Retrieval with varying document count]
5.3 vs 0.3
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.