Test-Time Compute Games
This addresses a market inefficiency problem for users of cloud-based LLM services, offering a novel mechanism to reduce overcharging, though it is incremental in applying auction theory to this domain.
The paper tackles the social inefficiency in LLM-as-a-service markets where providers overuse test-time compute to increase costs, and proposes a reverse second-price auction mechanism to align payments with marginal value, showing experimental results on models like Llama and Qwen across math and science benchmarks.
Test-time compute has emerged as a promising strategy to enhance the reasoning abilities of large language models (LLMs). However, this strategy has in turn increased how much users pay cloud-based providers offering LLM-as-a-service, since providers charge users for the amount of test-time compute they use to generate an output. In our work, we show that the market of LLM-as-a-service is socially inefficient: providers have a financial incentive to increase the amount of test-time compute, even if this increase contributes little to the quality of the outputs. To address this inefficiency, we introduce a reverse second-price auction mechanism where providers bid their offered price and (expected) quality for the opportunity to serve a user, and users pay proportionally to the marginal value generated by the winning provider relative to the second-highest bidder. To illustrate and complement our theoretical results, we conduct experiments with multiple instruct models from the $\texttt{Llama}$ and $\texttt{Qwen}$ families, as well as reasoning models distilled from $\texttt{DeepSeek-R1}$, on math and science benchmark datasets.