AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
This work addresses the challenge of efficient and flexible reasoning modulation in AI models, representing an incremental improvement over existing monotonic scaling methods.
The paper tackles the problem of modulating reasoning progress in large reasoning models at test time by introducing AlphaOne, a framework that dynamically schedules slow thinking transitions and deterministically terminates them for fast reasoning, demonstrating superior reasoning capability and efficiency across mathematical, coding, and scientific benchmarks.
This paper presents AlphaOne ($α$1), a universal framework for modulating reasoning progress in large reasoning models (LRMs) at test time. $α$1 first introduces $α$ moment, which represents the scaled thinking phase with a universal parameter $α$. Within this scaled pre-$α$ moment phase, it dynamically schedules slow thinking transitions by modeling the insertion of reasoning transition tokens as a Bernoulli stochastic process. After the $α$ moment, $α$1 deterministically terminates slow thinking with the end-of-thinking token, thereby fostering fast reasoning and efficient answer generation. This approach unifies and generalizes existing monotonic scaling methods by enabling flexible and dense slow-to-fast reasoning modulation. Extensive empirical studies on various challenging benchmarks across mathematical, coding, and scientific domains demonstrate $α$1's superior reasoning capability and efficiency. Project page: https://alphaone-project.github.io/