LGAICLDBJun 28, 2025

BEST-Route: Adaptive LLM Routing with Test-Time Optimal Compute

arXiv:2506.22716v142 citationsh-index: 73ICML
Originality Incremental advance
AI Analysis

This addresses cost-efficiency for LLM deployment at scale, offering a practical solution for users needing to balance performance and expense, though it builds incrementally on existing routing ideas.

The paper tackles the problem of high deployment costs for large language models (LLMs) by introducing BEST-Route, a routing framework that adaptively selects models and response counts based on query difficulty, achieving up to 60% cost reduction with less than 1% performance drop.

Large language models (LLMs) are powerful tools but are often expensive to deploy at scale. LLM query routing mitigates this by dynamically assigning queries to models of varying cost and quality to obtain a desired trade-off. Prior query routing approaches generate only one response from the selected model and a single response from a small (inexpensive) model was often not good enough to beat a response from a large (expensive) model due to which they end up overusing the large model and missing out on potential cost savings. However, it is well known that for small models, generating multiple responses and selecting the best can enhance quality while remaining cheaper than a single large-model response. We leverage this idea to propose BEST-Route, a novel routing framework that chooses a model and the number of responses to sample from it based on query difficulty and the quality thresholds. Experiments on real-world datasets demonstrate that our method reduces costs by up to 60% with less than 1% performance drop.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes