CL AI DC LGJan 21, 2025

AdaServe: Accelerating Multi-SLO LLM Serving with SLO-Customized Speculative Decoding

Zikun Li, Zhuofu Chen, Remi Delacourt, Gabriele Oliaro, Zeyu Wang, Qinghan Chen, Shuhuai Lin, April Yang, Zhihao Zhang, Zhuoming Chen, Sean Lai, Xinhao Cheng

arXiv:2501.12162v218.816 citationsh-index: 11

Originality Incremental advance

AI Analysis

This addresses the challenge of meeting heterogeneous latency requirements in LLM applications, such as interactive coding assistants, but is incremental as it builds on existing speculative decoding methods.

The paper tackles the problem of efficiently serving large language models with diverse service-level objectives by introducing AdaServe, a system that uses SLO-customized speculative decoding, reducing SLO violations by up to 4.3× and improving goodput by up to 1.9× compared to baselines.

Modern large language model (LLM) applications exhibit diverse service-level objectives (SLOs), from low-latency requirements in interactive coding assistants to more relaxed constraints in data wrangling tasks. Existing LLM serving systems, which rely on uniform batching and scheduling strategies, often fail to meet these heterogeneous SLOs concurrently. We present AdaServe, the first LLM serving system designed to support efficient multi-SLO serving through SLO-customized speculative decoding. AdaServe formulates multi-SLO serving as a constrained optimization problem and introduces a hardware-aware algorithm that constructs a speculation tree tailored to each request's latency target. It features a speculate-select-verify pipeline that enables fine-grained control over decoding speed while maximizing system throughput. AdaServe further adapts to workload variation by dynamically adjusting speculation parameters. Evaluations across diverse workloads show that AdaServe reduces SLO violations by up to 4.3$\times$ and improves goodput by up to 1.9$\times$ compared to the best performing baselines, highlighting its effectiveness in multi-SLO serving.

View on arXiv PDF

Similar