CVApr 2, 2025

FreSca: Scaling in Frequency Space Enhances Diffusion Models

arXiv:2504.02154v31 citationsh-index: 13
Originality Incremental advance
AI Analysis

This provides a model- and task-agnostic solution for enhancing control in LDMs, addressing a specific bottleneck in image tasks, but it is incremental as it builds on existing classifier-free guidance without retraining.

The paper tackles the challenge of achieving fine-grained, disentangled control over global structures versus fine details in latent diffusion models (LDMs) by introducing FreSca, a plug-and-play framework that decomposes noise difference into frequency components and applies independent scaling factors, demonstrating improved generation quality and structural emphasis across multiple architectures and applications like image generation and video synthesis.

Latent diffusion models (LDMs) have achieved remarkable success in a variety of image tasks, yet achieving fine-grained, disentangled control over global structures versus fine details remains challenging. This paper explores frequency-based control within latent diffusion models. We first systematically analyze frequency characteristics across pixel space, VAE latent space, and internal LDM representations. This reveals that the "noise difference" term, derived from classifier-free guidance at each step t, is a uniquely effective and semantically rich target for manipulation. Building on this insight, we introduce FreSca, a novel and plug-and-play framework that decomposes noise difference into low- and high-frequency components and applies independent scaling factors to them via spatial or energy-based cutoffs. Essentially, FreSca operates without any model retraining or architectural change, offering model- and task-agnostic control. We demonstrate its versatility and effectiveness in improving generation quality and structural emphasis on multiple architectures (e.g., SD3, SDXL) and across applications including image generation, editing, depth estimation, and video synthesis, thereby unlocking a new dimension of expressive control within LDMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes