LGAug 17, 2025

Cost-Aware Contrastive Routing for LLMs

arXiv:2508.12491v210 citationsh-index: 17Has Code
Originality Incremental advance
AI Analysis

This addresses the problem of efficient model selection for users of LLM services, though it appears incremental over existing routing approaches.

The paper tackles the problem of cost-aware routing for large language models across diverse model pools by introducing Cost-Spectrum Contrastive Routing (CSCR), which improves the accuracy-cost tradeoff by up to 25% across multiple benchmarks.

We study cost-aware routing for large language models across diverse and dynamic pools of models. Existing approaches often overlook prompt-specific context, rely on expensive model profiling, assume a fixed set of experts, or use inefficient trial-and-error strategies. We introduce Cost-Spectrum Contrastive Routing (CSCR), a lightweight framework that maps both prompts and models into a shared embedding space to enable fast, cost-sensitive selection. CSCR uses compact, fast-to-compute logit footprints for open-source models and perplexity fingerprints for black-box APIs. A contrastive encoder is trained to favor the cheapest accurate expert within adaptive cost bands. At inference time, routing reduces to a single k-NN lookup via a FAISS index, requiring no retraining when the expert pool changes and enabling microsecond latency. Across multiple benchmarks, CSCR consistently outperforms baselines, improving the accuracy-cost tradeoff by up to 25%, while generalizing robustly to unseen LLMs and out-of-distribution prompts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes