CVLGDec 7, 2023

Resolution Chromatography of Diffusion Models

arXiv:2401.10247v12 citationsh-index: 4
Originality Incremental advance
AI Analysis

This provides a theoretical framework for understanding and improving diffusion models, which is incremental but useful for researchers and practitioners in image generation.

The paper introduces 'resolution chromatography' to mathematically explain the coarse-to-fine behavior in diffusion models, showing which resolution levels dominate at specific time steps, and applies this to tasks like upscaling pre-trained models and time-dependent prompt composing.

Diffusion models generate high-resolution images through iterative stochastic processes. In particular, the denoising method is one of the most popular approaches that predicts the noise in samples and denoises it at each time step. It has been commonly observed that the resolution of generated samples changes over time, starting off blurry and coarse, and becoming sharper and finer. In this paper, we introduce "resolution chromatography" that indicates the signal generation rate of each resolution, which is very helpful concept to mathematically explain this coarse-to-fine behavior in generation process, to understand the role of noise schedule, and to design time-dependent modulation. Using resolution chromatography, we determine which resolution level becomes dominant at a specific time step, and experimentally verify our theory with text-to-image diffusion models. We also propose some direct applications utilizing the concept: upscaling pre-trained models to higher resolutions and time-dependent prompt composing. Our theory not only enables a better understanding of numerous pre-existing techniques for manipulating image generation, but also suggests the potential for designing better noise schedules.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes