More Bang for the Buck: Improving the Inference of Large Language Models at a Fixed Budget using Reset and Discard (ReD)

arXiv:2601.21522v1h-index: 29
Originality Incremental advance
AI Analysis

This addresses the efficiency of LLM inference for users with budget constraints, offering a practical improvement but is incremental as it builds on existing metrics and methods.

The paper tackles the problem of diminishing returns in LLM inference at a fixed budget by proposing Reset-and-Discard (ReD), a query method that increases coverage@cost, reducing required attempts, tokens, and USD cost by substantial amounts in experiments on HumanEval with three LLMs.

The performance of large language models (LLMs) on verifiable tasks is usually measured by pass@k, the probability of answering a question correctly at least once in k trials. At a fixed budget, a more suitable metric is coverage@cost, the average number of unique questions answered as a function of the total number of attempts. We connect the two metrics and show that the empirically-observed power-law behavior in pass@k leads to a sublinear growth of the coverage@cost (diminishing returns). To solve this problem, we propose Reset-and-Discard (ReD), a query method of LLMs that increases coverage@cost for any given budget, regardless of the pass@k form. Moreover, given a pass@k, we can quantitatively predict the savings in the total number of attempts using ReD. If pass@k is not available for the model, ReD can infer its power-law exponent. Experiments on three LLMs using HumanEval demonstrate that ReD substantially reduces the required attempts, tokens, and USD cost to reach a desired coverage, while also offering an efficient way to measure inference power-laws.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes