LGCVAug 20, 2025

Disentanglement in T-space for Faster and Distributed Training of Diffusion Models with Fewer Latent-states

arXiv:2508.14413v1h-index: 11
Originality Highly original
AI Analysis

This work addresses the computational inefficiency of diffusion models for researchers and practitioners by enabling faster and distributed training with fewer resources.

The paper challenges the assumption that diffusion models require many latent states for training, showing that careful noise scheduling enables training with as few as 32 states to match performance of models with 1,000 states, and further reduces this to a single state via disentanglement, achieving 4-6x faster convergence on two datasets.

We challenge a fundamental assumption of diffusion models, namely, that a large number of latent-states or time-steps is required for training so that the reverse generative process is close to a Gaussian. We first show that with careful selection of a noise schedule, diffusion models trained over a small number of latent states (i.e. $T \sim 32$) match the performance of models trained over a much large number of latent states ($T \sim 1,000$). Second, we push this limit (on the minimum number of latent states required) to a single latent-state, which we refer to as complete disentanglement in T-space. We show that high quality samples can be easily generated by the disentangled model obtained by combining several independently trained single latent-state models. We provide extensive experiments to show that the proposed disentangled model provides 4-6$\times$ faster convergence measured across a variety of metrics on two different datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes