Towards Accurate Guided Diffusion Sampling through Symplectic Adjoint Method
This work addresses a specific bottleneck in guided diffusion sampling for image and video generation, offering an incremental improvement over existing methods.
The paper tackled the problem of inaccurate guidance in training-free guided sampling for diffusion models, especially in early generation stages, by proposing Symplectic Adjoint Guidance (SAG) that uses multiple function calls and a symplectic adjoint method, resulting in higher-quality image and video generation compared to baselines.
Training-free guided sampling in diffusion models leverages off-the-shelf pre-trained networks, such as an aesthetic evaluation model, to guide the generation process. Current training-free guided sampling algorithms obtain the guidance energy function based on a one-step estimate of the clean image. However, since the off-the-shelf pre-trained networks are trained on clean images, the one-step estimation procedure of the clean image may be inaccurate, especially in the early stages of the generation process in diffusion models. This causes the guidance in the early time steps to be inaccurate. To overcome this problem, we propose Symplectic Adjoint Guidance (SAG), which calculates the gradient guidance in two inner stages. Firstly, SAG estimates the clean image via $n$ function calls, where $n$ serves as a flexible hyperparameter that can be tailored to meet specific image quality requirements. Secondly, SAG uses the symplectic adjoint method to obtain the gradients accurately and efficiently in terms of the memory requirements. Extensive experiments demonstrate that SAG generates images with higher qualities compared to the baselines in both guided image and video generation tasks.