DCApr 6

GENSERVE: Efficient Co-Serving of Heterogeneous Diffusion Model Workloads

Fanjiang Ye, Zhangke Li, Xinrui Zhong, Ethan Ma, Russell Chen, Kaijian Wang, Jingwei Zuo, Desen Sun, Ye Cao, Triston Cao, Myungjin Lee, Arvind Krishnamurthy

arXiv:2604.0433584.9

Predicted impact top 3% in DC · last 90 daysOriginality Incremental advance

AI Analysis

This addresses the challenge for production platforms needing to efficiently serve mixed AI workloads, but it is incremental as it builds on existing diffusion model serving systems.

The paper tackled the problem of co-serving heterogeneous diffusion model workloads (text-to-image and text-to-video) on shared GPU clusters to meet latency SLOs, and the result was GENSERVE, a system that improved SLO attainment rates by up to 44% over baselines.

Diffusion models have emerged as the prevailing approach for text-to-image (T2I) and text-to-video (T2V) generation, yet production platforms must increasingly serve both modalities on shared GPU clusters while meeting stringent latency SLOs. Co-serving such heterogeneous workloads is challenging: T2I and T2V requests exhibit vastly different compute demands, parallelism characteristics, and latency requirements, leading to significant SLO violations in existing serving systems. We present GENSERVE, a co-serving system that leverages the inherent predictability of the diffusion process to optimize serving efficiency. A central insight is that diffusion inference proceeds in discrete, predictable steps and is naturally preemptible at step boundaries, opening a new design space for heterogeneity-aware resource management. GENSERVE introduces step-level resource adaptation through three coordinated mechanisms: intelligent video preemption, elastic sequence parallelism with dynamic batching, and an SLO-aware scheduler that jointly optimizes resource allocation across all concurrent requests. Experimental results show that GENSERVE improves the SLO attainment rate by up to 44% over the strongest baseline across diverse configurations.

View on arXiv PDF

Similar