ShortFT: Diffusion Model Alignment via Shortcut-based Fine-Tuning
This addresses the challenge of computational inefficiency and suboptimal results in diffusion model alignment for researchers and practitioners, though it appears incremental as it builds on existing trajectory-preserving few-step diffusion models.
The paper tackles the problem of aligning diffusion models with reward functions efficiently by introducing Shortcut-based Fine-Tuning (ShortFT), which uses a shorter denoising chain to avoid computational costs and gradient explosion, resulting in significantly improved alignment performance and surpassing state-of-the-art alternatives.
Backpropagation-based approaches aim to align diffusion models with reward functions through end-to-end backpropagation of the reward gradient within the denoising chain, offering a promising perspective. However, due to the computational costs and the risk of gradient explosion associated with the lengthy denoising chain, existing approaches struggle to achieve complete gradient backpropagation, leading to suboptimal results. In this paper, we introduce Shortcut-based Fine-Tuning (ShortFT), an efficient fine-tuning strategy that utilizes the shorter denoising chain. More specifically, we employ the recently researched trajectory-preserving few-step diffusion model, which enables a shortcut over the original denoising chain, and construct a shortcut-based denoising chain of shorter length. The optimization on this chain notably enhances the efficiency and effectiveness of fine-tuning the foundational model. Our method has been rigorously tested and can be effectively applied to various reward functions, significantly improving alignment performance and surpassing state-of-the-art alternatives.