AIMay 12

Nice Fold or Hero Call: Learning Budget-Efficient Thinking for Adaptive Reasoning

Zhaomeng Zhou, Lan Zhang, Junyang Wang, Mu Yuan, Junda Lin

arXiv:2605.1162588.4

Predicted impact top 22% in AI · last 90 daysOriginality Incremental advance

AI Analysis

For practitioners deploying large reasoning models, BET provides a practical method to reduce inference cost without sacrificing accuracy by learning when to abstain or invest compute based on solvability.

Large reasoning models often waste compute on unsolvable queries or compress hard-but-solvable ones. Budget-Efficient Thinking (BET) reduces reasoning tokens by ~55% on average while improving performance across seven benchmarks and three base models, and transfers zero-shot to scientific QA and logical reasoning.

Large reasoning models (LRMs) improve problem solving through extended reasoning, but often misallocate test-time compute. Existing efficiency methods reduce cost by compressing reasoning traces or conditioning budget on perceived difficulty, yet largely overlook solvability. As a result, they may spend large budgets on queries beyond the model's capability while compressing hard-but-solvable queries that require deeper reasoning. In this work, we formulate adaptive reasoning as a computational investment under uncertainty, where budget should follow the expected return of reasoning rather than perceived difficulty alone. To instantiate this principle, we propose Budget-Efficient Thinking (BET), a two-stage framework that combines behavioral cold-start with GRPO under an investment-cost-aware reward. By aligning solve-or-fold decisions with rollout-derived solvability, BET learns three behaviors: (1) short solve, answering easy queries concisely; (2) nice fold, abstaining early when continued reasoning has near-zero expected return; and (3) hero call, preserving sufficient compute for hard-but-solvable queries. Across seven benchmarks and three base models, BET reduces reasoning tokens by ~55% on average while achieving overall performance improvements, and transfers zero-shot from mathematical reasoning to scientific QA and logical reasoning with comparable efficiency gains.

View on arXiv PDF

Similar