CVAug 2, 2023

Patched Denoising Diffusion Models For High-Resolution Image Synthesis

Stanford
arXiv:2308.01316v156 citationsh-index: 77
Originality Incremental advance
AI Analysis

This work addresses the memory and quality challenges in high-resolution image synthesis for applications like computer vision and graphics, though it is incremental as it builds on existing patch-based and diffusion methods.

The authors tackled the problem of generating high-resolution images using denoising diffusion models by training on small patches and introducing a feature collage strategy to avoid boundary artifacts, achieving state-of-the-art FID scores on multiple datasets including a new nature image dataset and standard benchmarks.

We propose an effective denoising diffusion model for generating high-resolution images (e.g., 1024$\times$512), trained on small-size image patches (e.g., 64$\times$64). We name our algorithm Patch-DM, in which a new feature collage strategy is designed to avoid the boundary artifact when synthesizing large-size images. Feature collage systematically crops and combines partial features of the neighboring patches to predict the features of a shifted image patch, allowing the seamless generation of the entire image due to the overlap in the patch feature space. Patch-DM produces high-quality image synthesis results on our newly collected dataset of nature images (1024$\times$512), as well as on standard benchmarks of smaller sizes (256$\times$256), including LSUN-Bedroom, LSUN-Church, and FFHQ. We compare our method with previous patch-based generation methods and achieve state-of-the-art FID scores on all four datasets. Further, Patch-DM also reduces memory complexity compared to the classic diffusion models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes