DCAIJan 29

EWSJF: An Adaptive Scheduler with Hybrid Partitioning for Mixed-Workload LLM Inference

arXiv:2601.21758v12 citationsh-index: 4Has Code
Originality Highly original
AI Analysis

This addresses a fundamental scheduling problem for efficient and responsive LLM serving, offering significant performance gains for systems handling mixed workloads.

The paper tackles the challenge of scheduling mixed workloads for LLM inference, where short interactive queries and long batch requests compete, by introducing EWSJF, an adaptive scheduler that improves throughput by over 30% and reduces average Time-To-First-Token for short requests by up to 4x compared to standard FCFS policies.

Serving Large Language Models (LLMs) under mixed workloads--short, latency-sensitive interactive queries alongside long, throughput-oriented batch requests--poses a fundamental scheduling challenge. Standard First-Come, First-Served (FCFS) policies suffer from severe head-of-line blocking, leading to high tail latency and underutilized hardware. We introduce EWSJF (Effective Workload-based Shortest Job First), an adaptive request-level scheduler that learns workload structure in real time to jointly improve fairness and throughput. EWSJF operates upstream of execution-level schedulers and integrates four components: (1) Refine-and-Prune, an unsupervised partitioning algorithm that discovers performance-homogeneous request groups; (2) Dynamic Queue Routing for assigning requests to these groups; (3) Density-Weighted Scoring, a context-aware prioritization function balancing urgency and fairness; and (4) Bayesian Meta-Optimization, which continuously tunes scoring and partitioning parameters based on live performance feedback. Implemented in vLLM, EWSJF improves end-to-end throughput by over 30% and reduces average Time-To-First-Token for short requests by up to 4x compared to FCFS. These results demonstrate that adaptive, learning-based request scheduling is a critical missing layer for efficient and responsive LLM serving. Implementation available at https://anonymous.4open.science/r/vllm_0110-32D8.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes