Reward-Based Online LLM Routing via NeuralUCB
This work addresses cost efficiency in LLM routing for AI applications, but it is incremental as it builds on existing NeuralUCB methods.
The study tackled the problem of cost-aware large language model routing by implementing a NeuralUCB-based policy, which outperformed baselines in utility reward and achieved lower inference costs while maintaining competitive reward compared to a max-quality reference.
This study investigates the use of NeuralUCB for cost-aware large language model (LLM) routing. Existing routing approaches can be broadly grouped into supervised routing methods and partial-feedback methods, each with different tradeoffs in efficiency and adaptivity. We implement a NeuralUCB-based routing policy and evaluate it on RouterBench under a simulated online setting. Experimental results show that the proposed method consistently outperforms random and min-cost baselines in utility reward. Compared with the max-quality reference, our method achieves substantially lower inference cost while maintaining competitive reward. These findings suggest that NeuralUCB is a promising approach for cost-aware LLM routing, while also highlighting remaining challenges in action discrimination and exploration.