Dynamic LLM Routing and Selection based on User Preferences: Balancing Performance, Cost, and Ethics
This addresses the need for cost-effective and ethical AI deployment in cloud platforms and regulated industries, but it is incremental as it builds on existing routing and filtering methods.
The paper tackles the problem of selecting the most suitable large language model for tasks by balancing performance, cost, and ethics, introducing OptiRoute, a routing engine that dynamically matches tasks to optimal models based on user-defined criteria, achieving efficient real-time applications.
With the widespread deployment of large language models (LLMs) such as GPT4, BART, and LLaMA, the need for a system that can intelligently select the most suitable model for specific tasks while balancing cost, latency, accuracy, and ethical considerations has become increasingly important. Recognizing that not all tasks necessitate models with over 100 billion parameters, we introduce OptiRoute, an advanced model routing engine designed to dynamically select and route tasks to the optimal LLM based on detailed user-defined requirements. OptiRoute captures both functional (e.g., accuracy, speed, cost) and non-functional (e.g., helpfulness, harmlessness, honesty) criteria, leveraging lightweight task analysis and complexity estimation to efficiently match tasks with the best-fit models from a diverse array of LLMs. By employing a hybrid approach combining k-nearest neighbors (kNN) search and hierarchical filtering, OptiRoute optimizes for user priorities while minimizing computational overhead. This makes it ideal for real-time applications in cloud-based ML platforms, personalized AI services, and regulated industries.