DCAINov 4, 2024

xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

arXiv:2411.01738v153 citationsh-index: 4Has Code
Originality Incremental advance
AI Analysis

This work addresses the real-time deployment challenge for DiTs in image and video generation, though it is incremental as it builds on existing parallel methods.

The paper tackles the high inference latency of Diffusion Transformers (DiTs) by introducing xDiT, a parallel inference engine that combines multiple parallel strategies, achieving exceptional scalability on Ethernet-connected GPU clusters and reducing latency by up to 80% in experiments.

Diffusion models are pivotal for generating high-quality images and videos. Inspired by the success of OpenAI's Sora, the backbone of diffusion models is evolving from U-Net to Transformer, known as Diffusion Transformers (DiTs). However, generating high-quality content necessitates longer sequence lengths, exponentially increasing the computation required for the attention mechanism, and escalating DiTs inference latency. Parallel inference is essential for real-time DiTs deployments, but relying on a single parallel method is impractical due to poor scalability at large scales. This paper introduces xDiT, a comprehensive parallel inference engine for DiTs. After thoroughly investigating existing DiTs parallel approaches, xDiT chooses Sequence Parallel (SP) and PipeFusion, a novel Patch-level Pipeline Parallel method, as intra-image parallel strategies, alongside CFG parallel for inter-image parallelism. xDiT can flexibly combine these parallel approaches in a hybrid manner, offering a robust and scalable solution. Experimental results on two 8xL40 GPUs (PCIe) nodes interconnected by Ethernet and an 8xA100 (NVLink) node showcase xDiT's exceptional scalability across five state-of-the-art DiTs. Notably, we are the first to demonstrate DiTs scalability on Ethernet-connected GPU clusters. xDiT is available at https://github.com/xdit-project/xDiT.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes