Breaking Model Lock-in: Cost-Efficient Zero-Shot LLM Routing via a Universal Latent Space
This addresses the bottleneck of inefficient and costly integration of novel LLMs for users in fragmented AI ecosystems, representing a new paradigm rather than an incremental improvement.
The paper tackles the problem of model lock-in in LLM routing by introducing ZeroRouter, a zero-shot routing framework that uses a universal latent space to onboard new models without retraining, achieving higher accuracy at lower cost and latency compared to baselines.
The rapid proliferation of Large Language Models (LLMs) has led to a fragmented and inefficient ecosystem, a state of ``model lock-in'' where seamlessly integrating novel models remains a significant bottleneck. Current routing frameworks require exhaustive, costly retraining, hindering scalability and adaptability. We introduce ZeroRouter, a new paradigm for LLM routing that breaks this lock-in. Our approach is founded on a universal latent space, a model-agnostic representation of query difficulty that fundamentally decouples the characterization of a query from the profiling of a model. This allows for zero-shot onboarding of new models without full-scale retraining. ZeroRouter features a context-aware predictor that maps queries to this universal space and a dual-mode optimizer that balances accuracy, cost, and latency. Our framework consistently outperforms all baselines, delivering higher accuracy at lower cost and latency.