AIJun 2

The Shadow Price of Reasoning: Economic Perspective on Optimal Budget Allocation for LLMs

Xu Wan, Speed Zhu, Jianwei Cai, Guang Chen, XiMing Huang, Wiggin Zhou, Mingyang Sun

arXiv:2606.0309253.1

AI Analysis

This work addresses the practical problem of allocating limited computational budgets across reasoning queries for LLM deployment, offering a principled economic approach.

The paper formulates inference budget allocation for LLMs as a constrained optimization problem, deriving an optimal policy based on a shadow price. Their proposed method, CLEAR, achieves up to a 3x improvement in global accuracy over uniform allocation under resource scarcity.

Inference-time scaling has emerged as a critical avenue for enhancing Large Language Models' performance, yet real-world deployment is constrained by strict computational budgets. In this work, we formulate inference budget allocation as a global constrained optimization problem governed by economic principles. By modeling per-query reasoning utility with a shifted-surge function, we derive an optimal allocation policy based on a global shadow price that equilibrates marginal utility under resource scarcity. Based on this theory, we propose Constrained Latent-utility Equilibrium Allocation for Reasoning (CLEAR). It performs rational abandonment and reallocates resources from insolvent queries to solvable queries near their emergence thresholds. Extensive experiments on several reasoning tasks with different traffic streams demonstrate that CLEAR significantly improves the Pareto frontier of total token cost versus mean accuracy. In resource-scarce regimes, CLEAR achieves up to a 3x improvement in global accuracy compared to uniform allocation.

View on arXiv PDF

Similar