CLAIOct 11, 2025

MatryoshkaThinking: Recursive Test-Time Scaling Enables Efficient Reasoning

arXiv:2510.10293v11 citationsh-index: 22Has Code
Originality Highly original
AI Analysis

This addresses the problem of inefficient inference for users of large language models, offering a novel method that is not incremental but builds on existing paradigms.

The paper tackles the computational overhead of test-time scaling in language models by proposing MatryoshkaThinking, which reduces computation by 96% compared to DeepConf while achieving a state-of-the-art score of 99.79 on AIME2025.

Test-time scaling has emerged as a promising paradigm in language modeling, wherein additional computational resources are allocated during inference to enhance model performance. Recent approaches, such as DeepConf, have demonstrated the efficacy of this strategy, however, they often incur substantial computational overhead to achieve competitive results. In this work, we propose MatryoshkaThinking, a novel method that significantly reduces computational cost while maintaining state-of-the-art performance. Specifically, MatryoshkaThinking attains a score of 99.79 on AIME2025 using only 4% of the computation required by DeepConf. The core of our approach lies in the recursive exploitation of the model's intrinsic capabilities in reasoning, verification, and summarization, which collectively enhance the retention of correct solutions and reduce the disparity between Pass@k and Pass@1. Comprehensive evaluations across multiple open-source models and challenging multi-modal reasoning benchmarks validate the effectiveness and generality of our method. These findings offer new insights into the design of efficient and scalable test-time inference strategies for advanced language models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes