DCCVAug 8, 2025

KnapFormer: An Online Load Balancer for Efficient Diffusion Transformers Training

arXiv:2508.06001v12 citationsh-index: 14Has Code
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in distributed training for diffusion models, offering a practical solution for researchers and engineers working with large-scale, variable-length data.

The paper tackles the problem of token imbalance in distributed training of Diffusion Transformers, which causes inefficiencies, and presents KnapFormer, a framework that combines workload balancing and sequence parallelism to achieve minimal communication overhead, less than 1% workload discrepancy, and 2x to 3x speedup in training state-of-the-art models like FLUX.

We present KnapFormer, an efficient and versatile framework to combine workload balancing and sequence parallelism in distributed training of Diffusion Transformers (DiT). KnapFormer builds on the insight that strong synergy exists between sequence parallelism and the need to address the significant token imbalance across ranks. This imbalance arises from variable-length text inputs and varying visual token counts in mixed-resolution and image-video joint training. KnapFormer redistributes tokens by first gathering sequence length metadata across all ranks in a balancing group and solving a global knapsack problem. The solver aims to minimize the variances of total workload per-GPU, while accounting for the effect of sequence parallelism. By integrating DeepSpeed-Ulysees-based sequence parallelism in the load-balancing decision process and utilizing a simple semi-empirical workload model, KnapFormers achieves minimal communication overhead and less than 1% workload discrepancy in real-world training workloads with sequence length varying from a few hundred to tens of thousands. It eliminates straggler effects and achieves 2x to 3x speedup when training state-of-the-art diffusion models like FLUX on mixed-resolution and image-video joint data corpora. We open-source the KnapFormer implementation at https://github.com/Kai-46/KnapFormer/

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes