CVLGSep 4, 2023

Relay Diffusion: Unifying diffusion process across resolutions for image synthesis

arXiv:2309.03350v190 citationsHas Code
Originality Highly original
AI Analysis

This addresses a key bottleneck in image synthesis for AI applications, offering a novel method to improve resolution without restarting from scratch, though it is incremental in the context of existing diffusion models.

The paper tackles the challenge of high-resolution image generation in diffusion models by introducing Relay Diffusion Model (RDM), which unifies diffusion processes across resolutions using blurring diffusion and block noise, achieving state-of-the-art FID on CelebA-HQ and sFID on ImageNet 256x256.

Diffusion models achieved great success in image synthesis, but still face challenges in high-resolution generation. Through the lens of discrete cosine transformation, we find the main reason is that \emph{the same noise level on a higher resolution results in a higher Signal-to-Noise Ratio in the frequency domain}. In this work, we present Relay Diffusion Model (RDM), which transfers a low-resolution image or noise into an equivalent high-resolution one for diffusion model via blurring diffusion and block noise. Therefore, the diffusion process can continue seamlessly in any new resolution or model without restarting from pure noise or low-resolution conditioning. RDM achieves state-of-the-art FID on CelebA-HQ and sFID on ImageNet 256$\times$256, surpassing previous works such as ADM, LDM and DiT by a large margin. All the codes and checkpoints are open-sourced at \url{https://github.com/THUDM/RelayDiffusion}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes