CVGRLGMay 9, 2024

Distilling Diffusion Models into Conditional GANs

arXiv:2405.05967v395 citationsECCV
AI Analysis

This addresses the computational bottleneck of diffusion models for image generation, enabling real-time applications, though it builds incrementally on existing distillation techniques.

The authors tackled the problem of slow inference in diffusion models by distilling them into single-step conditional GANs, achieving faster generation while maintaining image quality and outperforming state-of-the-art one-step models like DMD, SDXL-Turbo, and SDXL-Lightning on the zero-shot COCO benchmark.

We propose a method to distill a complex multistep diffusion model into a single-step conditional GAN student model, dramatically accelerating inference, while preserving image quality. Our approach interprets diffusion distillation as a paired image-to-image translation task, using noise-to-image pairs of the diffusion model's ODE trajectory. For efficient regression loss computation, we propose E-LatentLPIPS, a perceptual loss operating directly in diffusion model's latent space, utilizing an ensemble of augmentations. Furthermore, we adapt a diffusion model to construct a multi-scale discriminator with a text alignment loss to build an effective conditional GAN-based formulation. E-LatentLPIPS converges more efficiently than many existing distillation methods, even accounting for dataset construction costs. We demonstrate that our one-step generator outperforms cutting-edge one-step diffusion distillation models -- DMD, SDXL-Turbo, and SDXL-Lightning -- on the zero-shot COCO benchmark.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes