LGCLMay 11

UCCI: Calibrated Uncertainty for Cost-Optimal LLM Cascade Routing

arXiv:2605.1879626.3
Predicted impact top 77% in LG · last 90 daysOriginality Incremental advance
AI Analysis

For practitioners deploying LLM cascades, UCCI provides a principled, cost-optimal routing method that eliminates per-workload threshold tuning and achieves significant cost savings with calibrated uncertainty.

UCCI introduces a calibration-first router for LLM cascades that uses isotonic regression to map token-level margin uncertainty to error probabilities and selects escalation thresholds via constrained cost minimization. On a production NER workload with 75k queries, it reduces inference cost by 31% at micro-F1=0.91 while cutting ECE from 0.12 to 0.03.

LLM cascades and model routing promise lower inference cost by sending easy queries to a small model and escalating hard ones to a large model, but most deployed routers use uncalibrated confidence scores and require per-workload threshold tuning. We present UCCI, a calibration-first router that maps token-level margin uncertainty to a per-query error probability via isotonic regression and selects the escalation threshold by constrained cost minimization. Under three explicit assumptions, threshold policies on the calibrated score are cost-optimal, and isotonic calibration achieves O(n^{-1/3}) sample complexity for expected calibration error (ECE). On a production named entity recognition workload of 75,000 queries served by 4B and 12B instruction-tuned LLMs on H100 GPUs, UCCI cuts inference cost by 31% (95% CI: [27%, 35%]) at micro-F1 = 0.91 while reducing ECE from 0.12 to 0.03. At the same operating point, UCCI beats entropy thresholding, split-conformal routing, and a FrugalGPT-style learned threshold. All cascade results use end-to-end routing on actual model outputs and measured H100 latency, not simulated routing from global accuracies or nominal API prices.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes