LGFeb 19

2Mamba2Furious: Linear in Complexity, Competitive in Accuracy

arXiv:2602.17363v1h-index: 2
Originality Incremental advance
AI Analysis

This work addresses the problem of reduced accuracy in efficient linear attention transformers for machine learning practitioners, representing an incremental improvement.

The paper tackled the accuracy gap between linear attention transformers and softmax attention by simplifying and enhancing Mamba-2, resulting in a method called 2Mamba that achieves nearly as accurate as softmax attention while being more memory efficient for long contexts.

Linear attention transformers have become a strong alternative to softmax attention due to their efficiency. However, linear attention tends to be less expressive and results in reduced accuracy compared to softmax attention. To bridge the accuracy gap between softmax attention and linear attention, we manipulate Mamba-2, a very strong linear attention variant. We first simplify Mamba-2 down to its most fundamental and important components, evaluating which specific choices make it most accurate. From this simplified Mamba variant (Mamba-2S), we improve the A-mask and increase the order of the hidden state, resulting in a method, which we call 2Mamba, that is nearly as accurate as softmax attention, yet much more memory efficient for long context lengths. We also investigate elements to Mamba-2 that help surpass softmax attention accuracy. Code is provided for all our experiments

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes