SELGSep 18, 2025

CARGO: A Framework for Confidence-Aware Routing of Large Language Models

arXiv:2509.14899v12 citationsh-index: 6CASCON
Originality Incremental advance
AI Analysis

This addresses the challenge of efficient multi-model LLM deployments for users needing cost-effective routing, though it is incremental as it builds on existing routing and confidence methods.

The paper tackles the problem of routing user prompts to the most appropriate large language model (LLM) for balancing performance and cost, introducing CARGO, a confidence-aware framework that achieves a top-1 routing accuracy of 76.4% and win rates up to 89% against individual experts.

As large language models (LLMs) proliferate in scale, specialization, and latency profiles, the challenge of routing user prompts to the most appropriate model has become increasingly critical for balancing performance and cost. We introduce CARGO (Category-Aware Routing with Gap-based Optimization), a lightweight, confidence-aware framework for dynamic LLM selection. CARGO employs a single embedding-based regressor trained on LLM-judged pairwise comparisons to predict model performance, with an optional binary classifier invoked when predictions are uncertain. This two-stage design enables precise, cost-aware routing without the need for human-annotated supervision. To capture domain-specific behavior, CARGO also supports category-specific regressors trained across five task groups: mathematics, coding, reasoning, summarization, and creative writing. Evaluated on four competitive LLMs (GPT-4o, Claude 3.5 Sonnet, DeepSeek V3, and Perplexity Sonar), CARGO achieves a top-1 routing accuracy of 76.4% and win rates ranging from 72% to 89% against individual experts. These results demonstrate that confidence-guided, lightweight routing can achieve expert-level performance with minimal overhead, offering a practical solution for real-world, multi-model LLM deployments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes