CVAIApr 2, 2024

Upsample Guidance: Scale Up Diffusion Models without Training

arXiv:2404.01709v124 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses a bottleneck in scaling diffusion models for high-resolution generative tasks, offering a training-free solution that can be applied across various model types, though it is incremental as it builds on existing sampling processes.

The paper tackles the problem of generating high-resolution samples with diffusion models without additional training, achieving this by introducing upsample guidance, a technique that adapts pre-trained models to produce higher-resolution images (e.g., from 512^2 to 1536^2) by adding a single term during sampling.

Diffusion models have demonstrated superior performance across various generative tasks including images, videos, and audio. However, they encounter difficulties in directly generating high-resolution samples. Previously proposed solutions to this issue involve modifying the architecture, further training, or partitioning the sampling process into multiple stages. These methods have the limitation of not being able to directly utilize pre-trained models as-is, requiring additional work. In this paper, we introduce upsample guidance, a technique that adapts pretrained diffusion model (e.g., $512^2$) to generate higher-resolution images (e.g., $1536^2$) by adding only a single term in the sampling process. Remarkably, this technique does not necessitate any additional training or relying on external models. We demonstrate that upsample guidance can be applied to various models, such as pixel-space, latent space, and video diffusion models. We also observed that the proper selection of guidance scale can improve image quality, fidelity, and prompt alignment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes