CVDec 8, 2024

Nested Diffusion Models Using Hierarchical Latent Priors

arXiv:2412.05984v12 citationsh-index: 6CVPR
Originality Highly original
AI Analysis

This work addresses the challenge of improving image generation quality in diffusion models for applications in computer vision and generative AI, representing an incremental advancement through a novel hierarchical approach.

The paper tackles the problem of generating high-quality images of complex scenes by introducing nested diffusion models, a hierarchical generative framework that uses a series of diffusion models to progressively generate latent variables, resulting in significant enhancements in image quality across multiple datasets for both unconditional and conditional generation.

We introduce nested diffusion models, an efficient and powerful hierarchical generative framework that substantially enhances the generation quality of diffusion models, particularly for images of complex scenes. Our approach employs a series of diffusion models to progressively generate latent variables at different semantic levels. Each model in this series is conditioned on the output of the preceding higher-level models, culminating in image generation. Hierarchical latent variables guide the generation process along predefined semantic pathways, allowing our approach to capture intricate structural details while significantly improving image quality. To construct these latent variables, we leverage a pre-trained visual encoder, which learns strong semantic visual representations, and modulate its capacity via dimensionality reduction and noise injection. Across multiple datasets, our system demonstrates significant enhancements in image quality for both unconditional and class/text conditional generation. Moreover, our unconditional generation system substantially outperforms the baseline conditional system. These advancements incur minimal computational overhead as the more abstract levels of our hierarchy work with lower-dimensional representations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes