LGAIJun 20, 2024

DeciMamba: Exploring the Length Extrapolation Potential of Mamba

arXiv:2406.14528v344 citations
Originality Incremental advance
AI Analysis

This addresses a key bottleneck for efficient long-range sequence modeling, offering a practical improvement for NLP applications.

The paper tackled the limited length-generalization capabilities of Mamba in long-range sequence processing by introducing DeciMamba, a context-extension method that enables extrapolation to significantly longer contexts without additional training, achieving faster inference in real-world NLP tasks.

Long-range sequence processing poses a significant challenge for Transformers due to their quadratic complexity in input length. A promising alternative is Mamba, which demonstrates high performance and achieves Transformer-level capabilities while requiring substantially fewer computational resources. In this paper we explore the length-generalization capabilities of Mamba, which we find to be relatively limited. Through a series of visualizations and analyses we identify that the limitations arise from a restricted effective receptive field, dictated by the sequence length used during training. To address this constraint, we introduce DeciMamba, a context-extension method specifically designed for Mamba. This mechanism, built on top of a hidden filtering mechanism embedded within the S6 layer, enables the trained model to extrapolate well even without additional training. Empirical experiments over real-world long-range NLP tasks show that DeciMamba can extrapolate to context lengths that are significantly longer than the ones seen during training, while enjoying faster inference.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes