CVAILGNov 15, 2024

Guiding a diffusion model using sliding windows

arXiv:2411.10257v33 citationsh-index: 6Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of improving diffusion model performance for image generation without additional training, offering a novel guidance technique that is incremental but effective.

The paper tackles the problem of enhancing sample quality in diffusion models by introducing a training-free method called masked sliding window guidance (M-SWG), which upweights long-range spatial dependencies to achieve superior Inception scores and state-of-the-art Frechet DINOv2 distances on ImageNet.

Guidance is a widely used technique for diffusion models to enhance sample quality. Technically, guidance is realised by using an auxiliary model that generalises more broadly than the primary model. Using a 2D toy example, we first show that it is highly beneficial when the auxiliary model exhibits similar but stronger generalisation errors than the primary model. Based on this insight, we introduce \emph{masked sliding window guidance (M-SWG)}, a novel, training-free method. M-SWG upweights long-range spatial dependencies by guiding the primary model with itself by selectively restricting its receptive field. M-SWG requires neither access to model weights from previous iterations, additional training, nor class conditioning. M-SWG achieves a superior Inception score (IS) compared to previous state-of-the-art training-free approaches, without introducing sample oversaturation. In conjunction with existing guidance methods, M-SWG reaches state-of-the-art Frechet DINOv2 distance on ImageNet using EDM2-XXL and DiT-XL. The code is available at https://github.com/HHU-MMBS/swg_bmvc2025_official.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes