DCAIApr 9

LegoDiffusion: Micro-Serving Text-to-Image Diffusion Workflows

arXiv:2604.0812384.0
AI Analysis

This addresses inefficiencies in resource management and scalability for diffusion workflow serving systems, though it is incremental as it builds on existing serving approaches.

The paper tackled the problem of inefficient serving systems for text-to-image diffusion workflows by proposing LegoDiffusion, a system that decomposes workflows into independently managed model-execution nodes, resulting in up to 3x higher request rates and 8x better burst traffic tolerance.

Text-to-image generation executes a diffusion workflow comprising multiple models centered on a base diffusion model. Existing serving systems treat each workflow as an opaque monolith, provisioning, placing, and scaling all constituent models together, which obscures internal dataflow, prevents model sharing, and enforces coarse-grained resource management. In this paper, we make a case for micro-serving diffusion workflows with LegoDiffusion, a system that decomposes a workflow into loosely coupled model-execution nodes that can be independently managed and scheduled. By explicitly managing individual model inference, LegoDiffusion unlocks cluster-scale optimizations, including per-model scaling, model sharing, and adaptive model parallelism. Collectively, LegoDiffusion outperforms existing diffusion workflow serving systems, sustaining up to 3x higher request rates and tolerating up to 8x higher burst traffic.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes