CVAICLGRDec 4, 2023

Generative Powers of Ten

UW
arXiv:2312.02149v213 citationsh-index: 65CVPR
AI Analysis

This addresses the challenge of creating coherent multi-scale imagery for applications like digital art or visualization, though it appears incremental as an extension of existing diffusion methods.

The paper tackles the problem of generating consistent content across multiple image scales using a text-to-image model, achieving extreme semantic zooms from wide-angle views to macro shots, as demonstrated through qualitative comparisons showing effectiveness in multi-scale consistency.

We present a method that uses a text-to-image model to generate consistent content across multiple image scales, enabling extreme semantic zooms into a scene, e.g., ranging from a wide-angle landscape view of a forest to a macro shot of an insect sitting on one of the tree branches. We achieve this through a joint multi-scale diffusion sampling approach that encourages consistency across different scales while preserving the integrity of each individual sampling process. Since each generated scale is guided by a different text prompt, our method enables deeper levels of zoom than traditional super-resolution methods that may struggle to create new contextual structure at vastly different scales. We compare our method qualitatively with alternative techniques in image super-resolution and outpainting, and show that our method is most effective at generating consistent multi-scale content.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes