CVApr 16

Frequency-Aware Flow Matching for High-Quality Image Generation

Sucheng Ren, Qihang Yu, Ju He, Xiaohui Shen, Alan Yuille, Liang-Chieh Chen

arXiv:2604.1552181.1h-index: 12Has Code

Predicted impact top 27% in CV · last 90 daysOriginality Incremental advance

AI Analysis

For image generation researchers, this work improves fine-detail fidelity in flow matching models, though it is an incremental improvement over existing methods.

FreqFlow introduces frequency-aware conditioning into flow matching via a two-branch architecture, achieving state-of-the-art FID of 1.38 on ImageNet-256, outperforming DiT by 0.79 and SiT by 0.58 FID.

Flow matching models have emerged as a powerful framework for realistic image generation by learning to reverse a corruption process that progressively adds Gaussian noise. However, because noise is injected in the latent domain, its impact on different frequency components is non-uniform. As a result, during inference, flow matching models tend to generate low-frequency components (global structure) in the early stages, while high-frequency components (fine details) emerge only later in the reverse process. Building on this insight, we propose Frequency-Aware Flow Matching (FreqFlow), a novel approach that explicitly incorporates frequency-aware conditioning into the flow matching framework via time-dependent adaptive weighting. We introduce a two-branch architecture: (1) a frequency branch that separately processes low- and high-frequency components to capture global structure and refine textures and edges, and (2) a spatial branch that synthesizes images in the latent domain, guided by the frequency branch's output. By explicitly integrating frequency information into the generation process, FreqFlow ensures that both large-scale coherence and fine-grained details are effectively modeled low-frequency conditioning reinforces global structure, while high-frequency conditioning enhances texture fidelity and detail sharpness. On the class-conditional ImageNet-256 generation benchmark, our method achieves state-of-the-art performance with an FID of 1.38, surpassing the prior diffusion model DiT and flow matching model SiT by 0.79 and 0.58 FID, respectively. Code is available at https://github.com/OliverRensu/FreqFlow.

View on arXiv PDF Code

Similar