LG AISep 25, 2025

Score-based Idempotent Distillation of Diffusion Models

Shehtab Zaman, Chengyan Liu, Kenneth Chiu

arXiv:2509.21470v1h-index: 3

Originality Incremental advance

AI Analysis

This work addresses the efficiency and stability issues in generative modeling for image synthesis, offering a method that reduces computational cost while maintaining quality, though it is incremental as it builds on existing distillation and idempotent model concepts.

The paper tackles the problem of high computational cost in diffusion models by distilling idempotent generative networks (SIGN) from pre-trained diffusion models, achieving state-of-the-art results on CIFAR and CelebA datasets with faster inference and multi-step sampling capabilities.

Idempotent generative networks (IGNs) are a new line of generative models based on idempotent mapping to a target manifold. IGNs support both single-and multi-step generation, allowing for a flexible trade-off between computational cost and sample quality. But similar to Generative Adversarial Networks (GANs), conventional IGNs require adversarial training and are prone to training instabilities and mode collapse. Diffusion and score-based models are popular approaches to generative modeling that iteratively transport samples from one distribution, usually a Gaussian, to a target data distribution. These models have gained popularity due to their stable training dynamics and high-fidelity generation quality. However, this stability and quality come at the cost of high computational cost, as the data must be transported incrementally along the entire trajectory. New sampling methods, model distillation, and consistency models have been developed to reduce the sampling cost and even perform one-shot sampling from diffusion models. In this work, we unite diffusion and IGNs by distilling idempotent models from diffusion model scores, called SIGN. Our proposed method is highly stable and does not require adversarial losses. We provide a theoretical analysis of our proposed score-based training methods and empirically show that IGNs can be effectively distilled from a pre-trained diffusion model, enabling faster inference than iterative score-based models. SIGNs can perform multi-step sampling, allowing users to trade off quality for efficiency. These models operate directly on the source domain; they can project corrupted or alternate distributions back onto the target manifold, enabling zero-shot editing of inputs. We validate our models on multiple image datasets, achieving state-of-the-art results for idempotent models on the CIFAR and CelebA datasets.

View on arXiv PDF

Similar