CVLGDec 12, 2023

One-Step Diffusion Distillation via Deep Equilibrium Models

DeepMind
arXiv:2401.08639v170 citationsh-index: 75Has CodeNIPS
Originality Highly original
AI Analysis

This work addresses the challenge of accelerating diffusion models for practical generative applications, offering a simple and effective solution for researchers and practitioners in AI and computer vision.

The paper tackles the problem of slow sampling in diffusion models by introducing a one-step distillation method using a Deep Equilibrium model called the Generative Equilibrium Transformer (GET), achieving superior performance compared to existing one-step methods with comparable training budgets, such as matching a 5× larger ViT in FID scores.

Diffusion models excel at producing high-quality samples but naively require hundreds of iterations, prompting multiple attempts to distill the generation process into a faster network. However, many existing approaches suffer from a variety of challenges: the process for distillation training can be complex, often requiring multiple training stages, and the resulting models perform poorly when utilized in single-step generative applications. In this paper, we introduce a simple yet effective means of distilling diffusion models directly from initial noise to the resulting image. Of particular importance to our approach is to leverage a new Deep Equilibrium (DEQ) model as the distilled architecture: the Generative Equilibrium Transformer (GET). Our method enables fully offline training with just noise/image pairs from the diffusion model while achieving superior performance compared to existing one-step methods on comparable training budgets. We demonstrate that the DEQ architecture is crucial to this capability, as GET matches a $5\times$ larger ViT in terms of FID scores while striking a critical balance of computational cost and image quality. Code, checkpoints, and datasets are available.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes