CVNov 30, 2023

ElasticDiffusion: Training-free Arbitrary Size Image Generation through Global-Local Content Separation

arXiv:2311.18822v341 citationsh-index: 4
Originality Incremental advance
AI Analysis

This addresses a practical problem for users of text-to-image models who need flexible image dimensions, though it is an incremental improvement over existing decoding strategies.

The paper tackles the limitation of diffusion models to fixed image sizes by introducing ElasticDiffusion, a training-free method that enables arbitrary size generation, achieving superior coherence across aspect ratios compared to existing methods like MultiDiffusion and Stable Diffusion.

Diffusion models have revolutionized image generation in recent years, yet they are still limited to a few sizes and aspect ratios. We propose ElasticDiffusion, a novel training-free decoding method that enables pretrained text-to-image diffusion models to generate images with various sizes. ElasticDiffusion attempts to decouple the generation trajectory of a pretrained model into local and global signals. The local signal controls low-level pixel information and can be estimated on local patches, while the global signal is used to maintain overall structural consistency and is estimated with a reference image. We test our method on CelebA-HQ (faces) and LAION-COCO (objects/indoor/outdoor scenes). Our experiments and qualitative results show superior image coherence quality across aspect ratios compared to MultiDiffusion and the standard decoding strategy of Stable Diffusion. Project page: https://elasticdiffusion.github.io/

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes