CVLGIVMay 31, 2025

Latent Wavelet Diffusion For Ultra-High-Resolution Image Synthesis

arXiv:2506.00433v32 citationsh-index: 26
Originality Incremental advance
AI Analysis

This addresses the problem of computational efficiency and detail preservation in generative modeling for high-resolution images, offering a practical solution for scaling existing models without inference costs.

The paper tackles the challenge of high-resolution image synthesis by introducing Latent Wavelet Diffusion (LWD), a lightweight training framework that improves detail and texture fidelity for ultra-high-resolution (2K-4K) images, achieving consistent improvements in perceptual quality and FID scores across multiple baselines.

High-resolution image synthesis remains a core challenge in generative modeling, particularly in balancing computational efficiency with the preservation of fine-grained visual detail. We present Latent Wavelet Diffusion (LWD), a lightweight training framework that significantly improves detail and texture fidelity in ultra-high-resolution (2K-4K) image synthesis. LWD introduces a novel, frequency-aware masking strategy derived from wavelet energy maps, which dynamically focuses the training process on detail-rich regions of the latent space. This is complemented by a scale-consistent VAE objective to ensure high spectral fidelity. The primary advantage of our approach is its efficiency: LWD requires no architectural modifications and adds zero additional cost during inference, making it a practical solution for scaling existing models. Across multiple strong baselines, LWD consistently improves perceptual quality and FID scores, demonstrating the power of signal-driven supervision as a principled and efficient path toward high-resolution generative modeling.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes