LGAIDCMANov 24, 2024

Ensuring Fair LLM Serving Amid Diverse Applications

arXiv:2411.15997v14 citationsh-index: 28
Originality Incremental advance
AI Analysis

This addresses fairness issues for users of multi-tenant LLM platforms like MS CoPilot, though it is incremental as it builds on existing throttling and scheduling techniques.

The paper tackles the problem of unfair LLM serving in multi-tenant platforms by analyzing real-world data from MS CoPilot and developing FairServe, which improves fairness over state-of-the-art methods in experiments.

In a multi-tenant large language model (LLM) serving platform hosting diverse applications, some users may submit an excessive number of requests, causing the service to become unavailable to other users and creating unfairness. Existing fairness approaches do not account for variations in token lengths across applications and multiple LLM calls, making them unsuitable for such platforms. To address the fairness challenge, this paper analyzes millions of requests from thousands of users on MS CoPilot, a real-world multi-tenant LLM platform hosted by Microsoft. Our analysis confirms the inadequacy of existing methods and guides the development of FairServe, a system that ensures fair LLM access across diverse applications. FairServe proposes application-characteristic aware request throttling coupled with a weighted service counter based scheduling technique to curb abusive behavior and ensure fairness. Our experimental results on real-world traces demonstrate FairServe's superior performance compared to the state-of-the-art method in ensuring fairness. We are actively working on deploying our system in production, expecting to benefit millions of customers world-wide.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes