Multi-LLM Query Optimization

Arlen Dean, Zijin Zhang, Stefanus Jasin, Yuqing Liu

arXiv:2603.2461797.4h-index: 19

AI Analysis

This work addresses a practical challenge in deploying multiple LLMs for classification, offering a robust optimization framework with theoretical guarantees, though it is incremental in improving query planning efficiency.

The paper tackles the problem of optimally allocating queries across multiple large language models to minimize cost while guaranteeing reliability for every possible ground-truth label, and it develops a surrogate method with an asymptotic fully polynomial-time approximation scheme that achieves a cost ratio converging to one as error tolerances shrink.

Deploying multiple large language models (LLMs) in parallel to classify an unknown ground-truth label is a common practice, yet the problem of optimally allocating queries across heterogeneous models remains poorly understood. In this paper, we formulate a robust, offline query-planning problem that minimizes total query cost subject to statewise error constraints which guarantee reliability for every possible ground-truth label. We first establish that this problem is NP-hard via a reduction from the minimum-weight set cover problem. To overcome this intractability, we develop a surrogate by combining a union bound decomposition of the multi-class error into pairwise comparisons with Chernoff-type concentration bounds. The resulting surrogate admits a closed-form, multiplicatively separable expression in the query counts and is guaranteed to be feasibility-preserving. We further show that the surrogate is asymptotically tight at the optimization level: the ratio of surrogate-optimal cost to true optimal cost converges to one as error tolerances shrink, with an explicit rate of $O\left(\log\log(1/Î±_{\min}) / \log(1/Î±_{\min})\right)$. Finally, we design an asymptotic fully polynomial-time approximation scheme (AFPTAS) that returns a surrogate-feasible query plan within a $(1+\varepsilon)$ factor of the surrogate optimum.

View on arXiv PDF

Similar