AIJan 29, 2025

Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization

arXiv:2501.17974v227 citationsh-index: 21ICML
Originality Incremental advance
AI Analysis

This addresses inefficiency in reasoning for AI systems handling mathematical problems, offering a domain-specific incremental improvement.

The paper tackles the problem of large language models using unnecessarily long reasoning chains for trivial questions by proposing Inference Budget-Constrained Policy Optimization (IBPO), which allows models to allocate inference budgets based on query difficulty, resulting in absolute improvements of 4.14% and 5.74% on MATH500 with 2.16x and 4.32x inference budgets compared to LLaMA3.1 8B Instruct.

Solving mathematics problems has been an intriguing capability of large language models, and many efforts have been made to improve reasoning by extending reasoning length, such as through self-correction and extensive long chain-of-thoughts. While promising in problem-solving, advanced long reasoning chain models exhibit an undesired single-modal behavior, where trivial questions require unnecessarily tedious long chains of thought. In this work, we propose a way to allow models to be aware of inference budgets by formulating it as utility maximization with respect to an inference budget constraint, hence naming our algorithm Inference Budget-Constrained Policy Optimization (IBPO). In a nutshell, models fine-tuned through IBPO learn to ``understand'' the difficulty of queries and allocate inference budgets to harder ones. With different inference budgets, our best models are able to have a $4.14$\% and $5.74$\% absolute improvement ($8.08$\% and $11.2$\% relative improvement) on MATH500 using $2.16$x and $4.32$x inference budgets respectively, relative to LLaMA3.1 8B Instruct. These improvements are approximately $2$x those of self-consistency under the same budgets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes