LGFeb 2

Learning to Route and Schedule LLMs from User Retrials via Contextual Queueing Bandits

arXiv:2602.02061v11 citationsh-index: 1

Originality Incremental advance

AI Analysis

This addresses queue management for conversational LLM services, which is an incremental improvement over existing online algorithms.

The paper tackles the problem of efficiently routing and scheduling user queries to LLMs in server queues by developing an algorithm that learns from user retrial behaviors, achieving cumulative regret of Õ(√t) for routing and queue length regret of Õ(t^{-1/4}).

Explosive demands for LLMs often cause user queries to accumulate in server queues, requiring efficient routing (query-LLM matching) and scheduling (query prioritization) mechanisms. Several online algorithms are being deployed, but they overlook the following two key challenges inherent to conversational LLM services: (1) unsatisfied users may retry queries, increasing the server backlog, and (2) requests for ``explicit" feedback, such as ratings, degrade user experiences. In this paper, we develop a joint routing and scheduling algorithm that leverages ``implicit" feedback inferred from user retrial behaviors. The key idea is to propose and study the framework of contextual queueing bandits with multinomial logit feedback (CQB-MNL). CQB-MNL models query retrials, as well as context-based learning for user preferences over LLMs. Our algorithm, anytime CQB (ACQB), achieves efficient learning while maintaining queue stability by combining Thompson sampling with forced exploration at a decaying rate. We show that ACQB simultaneously achieves a cumulative regret of $\widetilde{\mathcal{O}}(\sqrt{t})$ for routing and a queue length regret of $\widetilde{\mathcal{O}}(t^{-1/4})$ for any large $t$. For experiments, we refine query embeddings via contrastive learning while adopting a disjoint parameter model to learn LLM-specific parameters. Experiments on SPROUT, EmbedLLM, and RouterBench datasets confirm that both algorithms consistently outperform baselines.

View on arXiv PDF

Similar