Improving Diffusion Model Efficiency Through Patching
This work addresses efficiency issues for users of diffusion models in image generation, though it is incremental as it builds on existing methods.
The paper tackled the problem of high computational cost per iteration in diffusion models by introducing a ViT-style patching transformation, which reduced sampling time and memory usage, as demonstrated empirically on datasets like LSUN Church, ImageNet 256, and FFHQ 1024.
Diffusion models are a powerful class of generative models that iteratively denoise samples to produce data. While many works have focused on the number of iterations in this sampling procedure, few have focused on the cost of each iteration. We find that adding a simple ViT-style patching transformation can considerably reduce a diffusion model's sampling time and memory usage. We justify our approach both through an analysis of the diffusion model objective, and through empirical experiments on LSUN Church, ImageNet 256, and FFHQ 1024. We provide implementations in Tensorflow and Pytorch.