REG: Rectified Gradient Guidance for Conditional Diffusion Models
This work addresses a theoretical-practical gap in guidance for conditional diffusion models, offering an incremental enhancement to improve generation quality in AI image synthesis.
The paper tackled the discrepancy between theoretical motivation and practical implementation of guidance techniques in conditional diffusion models by proposing a rectified gradient guidance (REG) method based on a valid scaled joint distribution objective. Experiments showed that REG consistently improved FID and Inception/CLIP scores on tasks like class-conditional ImageNet and text-to-image generation compared to prior methods.
Guidance techniques are simple yet effective for improving conditional generation in diffusion models. Albeit their empirical success, the practical implementation of guidance diverges significantly from its theoretical motivation. In this paper, we reconcile this discrepancy by replacing the scaled marginal distribution target, which we prove theoretically invalid, with a valid scaled joint distribution objective. Additionally, we show that the established guidance implementations are approximations to the intractable optimal solution under no future foresight constraint. Building on these theoretical insights, we propose rectified gradient guidance (REG), a versatile enhancement designed to boost the performance of existing guidance methods. Experiments on 1D and 2D demonstrate that REG provides a better approximation to the optimal solution than prior guidance techniques, validating the proposed theoretical framework. Extensive experiments on class-conditional ImageNet and text-to-image generation tasks show that incorporating REG consistently improves FID and Inception/CLIP scores across various settings compared to its absence.