DeciMamba: Exploring the Length Extrapolation Potential of Mamba
This addresses a key bottleneck for efficient long-range sequence modeling, offering a practical improvement for NLP applications.
The paper tackled the limited length-generalization capabilities of Mamba in long-range sequence processing by introducing DeciMamba, a context-extension method that enables extrapolation to significantly longer contexts without additional training, achieving faster inference in real-world NLP tasks.
Long-range sequence processing poses a significant challenge for Transformers due to their quadratic complexity in input length. A promising alternative is Mamba, which demonstrates high performance and achieves Transformer-level capabilities while requiring substantially fewer computational resources. In this paper we explore the length-generalization capabilities of Mamba, which we find to be relatively limited. Through a series of visualizations and analyses we identify that the limitations arise from a restricted effective receptive field, dictated by the sequence length used during training. To address this constraint, we introduce DeciMamba, a context-extension method specifically designed for Mamba. This mechanism, built on top of a hidden filtering mechanism embedded within the S6 layer, enables the trained model to extrapolate well even without additional training. Empirical experiments over real-world long-range NLP tasks show that DeciMamba can extrapolate to context lengths that are significantly longer than the ones seen during training, while enjoying faster inference.