Asymptotically Optimal Sequential Testing with Heterogeneous LLMs

Guokai Li, Jiaxin, Liang, Mo Liu, Yanzhe, Lei, Stefanus Jasin, Fenghua Yang, Preet Baxi

arXiv:2604.0108694.11 citations

AI Analysis

This work addresses efficient resource allocation in sequential decision-making with LLMs, offering an asymptotically optimal solution for scenarios with asymmetric information rates, though it is incremental in extending classical sequential testing to modern LLM contexts.

The paper tackles the problem of Bayesian binary sequential hypothesis testing using multiple large language models (LLMs) with heterogeneous costs and asymmetric accuracies, proving that as error tolerance approaches zero, the optimal policy asymptotically uses at most two LLMs and matches a universal lower bound up to a (1+o(1)) factor.

We study a Bayesian binary sequential hypothesis testing problem with multiple large language models (LLMs). Each LLM $j$ has per-query cost $c_j>0$, random waiting time with mean $Î¼_j>0$ and sub-Gaussian tails, and \emph{asymmetric} accuracies: the probability of returning the correct label depends on the true hypothesis $Î¸\in\{A,B\}$ and needs not be the same under $A$ and $B$. This asymmetry induces two distinct information rates $(I_{j,A}, I_{j,B})$ per LLM, one under each hypothesis. The decision-maker chooses LLMs sequentially, observes their noisy binary answers, and stops when the posterior probability of one hypothesis exceeds $1-Î±$. The objective is to minimize the sum of expected query cost and expected waiting cost, $\mathbb{E}[C_Ï] + \mathbb{E}[g(W_Ï)]$, where $C_Ï$ is the total query cost, $W_Ï$ is the total waiting time and $g$ is a polynomial function (e.g., $g(x)=x^Ï$ with $Ï\ge 1$). We prove that as the error tolerance $Î±\to0$, the optimal policy is asymptotically equivalent to one that uses at most two LLMs. In this case, a single-LLM policy is \emph{not} generically optimal: optimality now requires exploiting a two-dimensional tradeoff between information under $A$ and information under $B$. Any admissible policy induces an expected information-allocation vector in $\mathbb{R}_+^2$, and we show that the optimal allocation lies at an extreme point of the associated convex set when $Î±$ is relatively small, and hence uses at most two LLMs. We construct belief-dependent policies that first mix between two LLMs when the posterior is ambiguous, and then switch to a single ``specialist'' LLM when the posterior is sufficiently close to one of the hypotheses. These policies match the universal lower bound up to a $(1+o(1))$ factor as $Î±\rightarrow 0$.

View on arXiv PDF

Similar