MLAILGSep 25, 2025

Best-of-$\infty$ -- Asymptotic Performance of Test-Time Compute

arXiv:2509.21091v12 citationsh-index: 14
Originality Incremental advance
AI Analysis

This addresses the problem of high inference-time costs for researchers and practitioners using LLMs, though it is incremental as it builds on existing best-of-N methods.

The paper tackles the computational inefficiency of using infinite test-time compute for majority voting in LLMs by proposing an adaptive generation scheme that selects the number of samples based on answer agreement, and extends this to weighted ensembles of multiple LLMs with optimal weighting computed via mixed-integer linear programming. Experiments show the approach effectively improves performance.

We study best-of-$N$ for large language models (LLMs) where the selection is based on majority voting. In particular, we analyze the limit $N \to \infty$, which we denote as Best-of-$\infty$. While this approach achieves impressive performance in the limit, it requires an infinite test-time budget. To address this, we propose an adaptive generation scheme that selects $N$ based on answer agreement, thereby efficiently allocating inference-time computation. Beyond adaptivity, we extend the framework to weighted ensembles of multiple LLMs, showing that such mixtures can outperform any individual model. The optimal ensemble weighting is formulated and efficiently computed as a mixed-integer linear program. Extensive experiments demonstrate the effectiveness of our approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes