LGMay 14

Latency-Quality Routing for Functionally Equivalent Tools in LLM Agents

arXiv:2605.1424156.7
Predicted impact top 41% in LG · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the practical problem of routing queries among multiple functionally equivalent tool providers in LLM agents, where latency and quality vary, offering a method that avoids additive-reward collapse under load.

LQM-ContextRoute, a contextual bandit router for functionally equivalent tool providers, improves F1 by +2.18 pp over SW-UCB on web-search benchmarks and up to +18 pp accuracy in heterogeneous settings by treating latency as service capacity rather than an additive reward.

Tool-augmented LLM agents increasingly access the same tool type through multiple functionally equivalent providers, such as web-search APIs, retrievers, or LLM backends exposed behind a shared interface. This creates a provider-routing problem under runtime load: the router must choose among providers that differ in latency, reliability, and answer quality, often without gold labels at deployment time. We introduce LQM-ContextRoute, a contextual bandit router for same-function tool providers. Its key design is latency-quality matching: instead of letting low latency offset poor answers in an additive reward, the router ranks providers by expected answer quality per service cycle. It combines this capacity-aware score with query-specific quality estimation and LLM-as-judge feedback, allowing it to adapt online to both load changes and provider-quality differences. On the main web-search load benchmark, LQM-ContextRoute improves F1 by +2.18 pp over SW-UCB while staying on the latency-quality frontier. In a high-heterogeneity StrategyQA setting, LQM-ContextRoute avoids additive-reward collapse and improves accuracy by up to +18 pp over SW-UCB; on heterogeneous retriever pools, it improves NDCG by +2.91--+3.22 pp over SW-UCB. These results show that same-function tool routing benefits from treating latency as service capacity, especially when runtime pressure and provider-quality heterogeneity coexist.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes