CVMay 31, 2025

SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation

Xingtong Ge, Xin Zhang, Tongda Xu, Yi Zhang, Xinjie Zhang, Yan Wang, Jun Zhang

arXiv:2506.00523v119.017 citationsh-index: 12Has Code

Originality Incremental advance

AI Analysis

This work addresses a scalability challenge in distilling text-to-image models, which is incremental as it builds upon existing DMD methods to handle larger models.

The paper tackles the convergence difficulties of Distribution Matching Distillation (DMD) when applied to large-scale flow-based text-to-image models like SD 3.5 and FLUX, proposing implicit distribution alignment (IDA) and intra-segment guidance (ISG) to enable DMD to converge and achieve superior performance in distillation for models such as SDXL, SD 3.5 Large, and FLUX.

The Distribution Matching Distillation (DMD) has been successfully applied to text-to-image diffusion models such as Stable Diffusion (SD) 1.5. However, vanilla DMD suffers from convergence difficulties on large-scale flow-based text-to-image models, such as SD 3.5 and FLUX. In this paper, we first analyze the issues when applying vanilla DMD on large-scale models. Then, to overcome the scalability challenge, we propose implicit distribution alignment (IDA) to regularize the distance between the generator and fake distribution. Furthermore, we propose intra-segment guidance (ISG) to relocate the timestep importance distribution from the teacher model. With IDA alone, DMD converges for SD 3.5; employing both IDA and ISG, DMD converges for SD 3.5 and FLUX.1 dev. Along with other improvements such as scaled up discriminator models, our final model, dubbed \textbf{SenseFlow}, achieves superior performance in distillation for both diffusion based text-to-image models such as SDXL, and flow-matching models such as SD 3.5 Large and FLUX. The source code will be avaliable at https://github.com/XingtongGe/SenseFlow.

View on arXiv PDF Code

Similar