Classifier-Free Guidance: From High-Dimensional Analysis to Generalized Guidance Forms
This provides a theoretical foundation for CFG in generative models, addressing a key bottleneck for researchers and practitioners in high-dimensional applications like text-to-image generation.
The paper tackles the theoretical challenge of characterizing the distribution induced by Classifier-Free Guidance (CFG) in high-dimensional settings, showing that distortions vanish as data dimension grows, with CFG accurately reproducing the target distribution in infinite dimensions, and demonstrates improved robustness, fidelity, and diversity with a non-linear generalization in experiments.
Classifier-Free Guidance (CFG) is a widely adopted technique in diffusion and flow-based generative models, enabling high-quality conditional generation. A key theoretical challenge is characterizing the distribution induced by CFG, particularly in high-dimensional settings relevant to real-world data. Previous works have shown that CFG modifies the target distribution, steering it towards a distribution sharper than the target one, more shifted towards the boundary of the class. In this work, we provide a high-dimensional analysis of CFG, showing that these distortions vanish as the data dimension grows. We present a blessing-of-dimensionality result demonstrating that in sufficiently high and infinite dimensions, CFG accurately reproduces the target distribution. Using our high-dimensional theory, we show that there is a large family of guidances enjoying this property, in particular non-linear CFG generalizations. We study a simple non-linear power-law version, for which we demonstrate improved robustness, sample fidelity and diversity. Our findings are validated with experiments on class-conditional and text-to-image generation using state-of-the-art diffusion and flow-matching models.