In-situ Autoguidance: Eliciting Self-Correction in Diffusion Models
This addresses the computational overhead in diffusion models for image generation, offering an incremental improvement by eliminating the need for separately trained models.
The paper tackles the trade-off between image quality and diversity in diffusion models by introducing In-situ Autoguidance, a method that enables self-correction without auxiliary models, establishing a cost-efficient baseline for guidance.
The generation of high-quality, diverse, and prompt-aligned images is a central goal in image-generating diffusion models. The popular classifier-free guidance (CFG) approach improves quality and alignment at the cost of reduced variation, creating an inherent entanglement of these effects. Recent work has successfully disentangled these properties by guiding a model with a separately trained, inferior counterpart; however, this solution introduces the considerable overhead of requiring an auxiliary model. We challenge this prerequisite by introducing In-situ Autoguidance, a method that elicits guidance from the model itself without any auxiliary components. Our approach dynamically generates an inferior prediction on the fly using a stochastic forward pass, reframing guidance as a form of inference-time self-correction. We demonstrate that this zero-cost approach is not only viable but also establishes a powerful new baseline for cost-efficient guidance, proving that the benefits of self-guidance can be achieved without external models.