Pyramidal Denoising Diffusion Probabilistic Models
This work addresses the computational inefficiency problem for researchers and practitioners using diffusion models in computer vision, offering an incremental improvement.
The authors tackled the high computational cost of training and evaluating diffusion models by introducing a pyramidal diffusion model that generates high-resolution images from coarser inputs using a single score function with positional embedding, achieving time-efficient generation without performance loss.
Recently, diffusion model have demonstrated impressive image generation performances, and have been extensively studied in various computer vision tasks. Unfortunately, training and evaluating diffusion models consume a lot of time and computational resources. To address this problem, here we present a novel pyramidal diffusion model that can generate high resolution images starting from much coarser resolution images using a {\em single} score function trained with a positional embedding. This enables a neural network to be much lighter and also enables time-efficient image generation without compromising its performances. Furthermore, we show that the proposed approach can be also efficiently used for multi-scale super-resolution problem using a single score function.