DCAILGJul 2, 2024

SwiftDiffusion: Efficient Diffusion Model Serving with Add-on Modules

arXiv:2407.02031v27 citationsh-index: 11
Originality Incremental advance
AI Analysis

This addresses performance bottlenecks in production AI cloud services for text-to-image generation, offering incremental improvements in serving efficiency.

The paper tackles the inefficiency of serving text-to-image generation workflows with add-on modules like ControlNet and LoRA, which increase latency, by introducing SwiftDiffusion, a system that achieves up to 7.8x latency reduction and 1.6x throughput improvement for SDXL models on H800 GPUs.

Text-to-image (T2I) generation using diffusion models has become a blockbuster service in today's AI cloud. A production T2I service typically involves a serving workflow where a base diffusion model is augmented with various "add-on" modules, notably ControlNet and LoRA, to enhance image generation control. Compared to serving the base model alone, these add-on modules introduce significant loading and computational overhead, resulting in increased latency. In this paper, we present SwiftDiffusion, a system that efficiently serves a T2I workflow through a holistic approach. SwiftDiffusion decouples ControNet from the base model and deploys it as a separate, independently scaled service on dedicated GPUs, enabling ControlNet caching, parallelization, and sharing. To mitigate the high loading overhead of LoRA serving, SwiftDiffusion employs a bounded asynchronous LoRA loading (BAL) technique, allowing LoRA loading to overlap with the initial base model execution by up to k steps without compromising image quality. Furthermore, SwiftDiffusion optimizes base model execution with a novel latent parallelism technique. Collectively, these designs enable SwiftDiffusion to outperform the state-of-the-art T2I serving systems, achieving up to 7.8x latency reduction and 1.6x throughput improvement in serving SDXL models on H800 GPUs, without sacrificing image quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes