LGApr 14, 2025

M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Junxiong Wang, Wen-Ding Li, Daniele Paliotta, Daniel Ritter, Alexander M. Rush, Tri Dao

arXiv:2504.10449v326.420 citationsh-index: 9Has Code

Originality Incremental advance

AI Analysis

This addresses the computational inefficiency of transformer-based models in reasoning tasks, offering a more scalable solution for AI applications in mathematics and similar domains, though it is incremental in building on existing Mamba and distillation techniques.

The paper tackles the problem of scaling test-time computation for complex mathematical reasoning by introducing M1, a hybrid linear RNN model based on Mamba, which achieves over 3x speedup compared to same-size transformers and matches state-of-the-art performance on benchmarks like AIME and MATH.

Effective reasoning is crucial to solving complex mathematical problems. Recent large language models (LLMs) have boosted performance by scaling test-time computation through long chain-of-thought reasoning. However, transformer-based models are inherently limited in extending context length due to their quadratic computational complexity and linear memory requirements. In this paper, we introduce a novel hybrid linear RNN reasoning model, M1, built on the Mamba architecture, which allows memory-efficient inference. Our approach leverages a distillation process from existing reasoning models and is further enhanced through RL training. Experimental results on the AIME and MATH benchmarks show that M1 not only outperforms previous linear RNN models but also matches the performance of state-of-the-art Deepseek R1 distilled reasoning models at a similar scale. We also compare our generation speed with a highly performant general purpose inference engine, vLLM, and observe more than a 3x speedup compared to a same size transformer. With throughput speedup, we are able to achieve higher accuracy compared to DeepSeek R1 distilled transformer reasoning models under a fixed generation time budget using self-consistency voting. Overall, we introduce a hybrid Mamba reasoning model and provide a more effective approach to scaling test-time generation using self-consistency or long chain of thought reasoning.

View on arXiv PDF Code

Similar