CVAIDec 22, 2025

MixFlow Training: Alleviating Exposure Bias with Slowed Interpolation Mixture

arXiv:2512.19311v1h-index: 6
Originality Incremental advance
AI Analysis

It addresses a key training-testing discrepancy issue for diffusion models in image generation, offering incremental improvements over existing methods.

This paper tackles the exposure bias problem in diffusion models by introducing MixFlow, a training approach that uses slowed interpolation mixtures to align training and testing inputs, resulting in improved image generation performance, such as achieving FID scores as low as 1.10 on ImageNet at 512x512 resolution with guidance.

This paper studies the training-testing discrepancy (a.k.a. exposure bias) problem for improving the diffusion models. During training, the input of a prediction network at one training timestep is the corresponding ground-truth noisy data that is an interpolation of the noise and the data, and during testing, the input is the generated noisy data. We present a novel training approach, named MixFlow, for improving the performance. Our approach is motivated by the Slow Flow phenomenon: the ground-truth interpolation that is the nearest to the generated noisy data at a given sampling timestep is observed to correspond to a higher-noise timestep (termed slowed timestep), i.e., the corresponding ground-truth timestep is slower than the sampling timestep. MixFlow leverages the interpolations at the slowed timesteps, named slowed interpolation mixture, for post-training the prediction network for each training timestep. Experiments over class-conditional image generation (including SiT, REPA, and RAE) and text-to-image generation validate the effectiveness of our approach. Our approach MixFlow over the RAE models achieve strong generation results on ImageNet: 1.43 FID (without guidance) and 1.10 (with guidance) at 256 x 256, and 1.55 FID (without guidance) and 1.10 (with guidance) at 512 x 512.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes