LG AI DCOct 19, 2025

Justitia: Fair and Efficient Scheduling for LLM Applications

Mingyan Yang, Guanjie Wang, Manqi Luo, Yifei Liu, Chen Chen, Han Zhao, Yu Feng, Quan Chen, Minyi Guo

arXiv:2510.17015v14.1h-index: 10

Originality Incremental advance

AI Analysis

This work addresses scheduling inefficiencies for LLM applications in shared GPU servers, offering a solution that balances fairness and performance, though it appears incremental as it builds on existing frameworks like vLLM.

The paper tackles the problem of scheduling LLM applications on shared GPU servers, where existing schedulers suffer from inefficiencies like head-of-line blocking or over-constrained resource allocation, and proposes Justitia, a novel scheduler that improves scheduling efficiency while preserving fairness, as shown in experimental results with diverse LLM applications.

In the era of Large Language Models (LLMs), it has been popular to launch a series of LLM inferences -- we call an LLM application -- to better solve real-world problems. When serving those applications in shared GPU servers, the schedulers are expected to attain fast application completions with guaranteed worst-case performance. However, mainstream LLM schedulers fail to behave well for LLM applications -- due to head-of-line blocking or over-constrained resource allocation. In this paper, we propose to serve LLM applications in a fair and also efficient manner. To this end, we design Justitia, a novel scheduler with three key techniques. First, given that memory is prevalently a bottleneck for mainstream inference frameworks like vLLM, Justitia models the service cost of LLM applications in a memory-centric manner. Meanwhile, it uses a simple neural network model to conduct light-weight and also accurate demand prediction. Moreover, Justitia adopts a virtual-time based fair queuing algorithm to reduce the overall performance with guaranteed worst-case delay. We have implemented Justitia atop vLLM, and experimental results involving diverse LLM applications show that it can substantially enhance the scheduling efficiency with fairness preserved.

View on arXiv PDF

Similar