CLOct 14, 2024

A Unified Approach to Routing and Cascading for LLMs

arXiv:2410.10347v347 citationsh-index: 64ICML
Originality Incremental advance
AI Analysis

This addresses the cost-performance tradeoff in LLM deployment for users and developers, but it is incremental as it builds on existing routing and cascading paradigms.

The paper tackled the problem of model selection for large language models (LLMs) by proposing a unified framework that integrates routing and cascading, showing that it consistently outperforms individual approaches by a large margin in experiments.

The availability of a wide range of large language models (LLMs) embedded in various agentic systems has significantly increased the potential of model selection strategies to improve the cost-performance tradeoff. Existing strategies involve either routing, where a single model is chosen per query, or cascading, which sequentially runs increasingly larger models until a satisfactory answer is found. However, current approaches face three key limitations: they (1) lack formal proofs of optimality, (2) fail to identify the conditions under which these strategies are most effective to improve the cost-performance tradeoff, and (3) are unable to combine both paradigms for further improvements. To address these issues, we first derive a novel optimal strategy for cascading and prove the optimality of an existing routing strategy. Further, we propose cascade routing, a unified framework that integrates routing and cascading into a theoretically optimal strategy. Through our analysis, we identify good quality estimators as the critical factor for the success of model selection paradigms. Finally, in our experiments, we show that cascade routing consistently outperforms the individual approaches by a large margin and we analyze quality estimators to determine when routing and/or cascading are useful paradigms for model selection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes