CVAug 2, 2023

Patched Denoising Diffusion Models For High-Resolution Image Synthesis

Zheng Ding, Mengqi Zhang, Jiajun Wu, Zhuowen Tu

Stanford

arXiv:2308.01316v121.856 citationsh-index: 77Has Code

Originality Incremental advance

AI Analysis

This work addresses the memory and quality challenges in high-resolution image synthesis for applications like computer vision and graphics, though it is incremental as it builds on existing patch-based and diffusion methods.

The authors tackled the problem of generating high-resolution images using denoising diffusion models by training on small patches and introducing a feature collage strategy to avoid boundary artifacts, achieving state-of-the-art FID scores on multiple datasets including a new nature image dataset and standard benchmarks.

We propose an effective denoising diffusion model for generating high-resolution images (e.g., 1024$\times$512), trained on small-size image patches (e.g., 64$\times$64). We name our algorithm Patch-DM, in which a new feature collage strategy is designed to avoid the boundary artifact when synthesizing large-size images. Feature collage systematically crops and combines partial features of the neighboring patches to predict the features of a shifted image patch, allowing the seamless generation of the entire image due to the overlap in the patch feature space. Patch-DM produces high-quality image synthesis results on our newly collected dataset of nature images (1024$\times$512), as well as on standard benchmarks of smaller sizes (256$\times$256), including LSUN-Bedroom, LSUN-Church, and FFHQ. We compare our method with previous patch-based generation methods and achieve state-of-the-art FID scores on all four datasets. Further, Patch-DM also reduces memory complexity compared to the classic diffusion models.

View on arXiv PDF Code

Similar