Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening
This addresses the challenge of efficiently aligning diffusion models for applications like text-to-image generation, offering a scalable solution that is incremental over prior fine-tuning and trajectory optimization methods.
The paper tackles the problem of fine-tuning diffusion models for better downstream alignment by proposing Diffusion-Sharpening, which optimizes sampling trajectories during training to improve efficiency and performance. It outperforms existing methods like Diffusion-DPO and Inference Scaling in text alignment, compositional capabilities, and human preferences without requiring additional inference costs.
We propose Diffusion-Sharpening, a fine-tuning approach that enhances downstream alignment by optimizing sampling trajectories. Existing RL-based fine-tuning methods focus on single training timesteps and neglect trajectory-level alignment, while recent sampling trajectory optimization methods incur significant inference NFE costs. Diffusion-Sharpening overcomes this by using a path integral framework to select optimal trajectories during training, leveraging reward feedback, and amortizing inference costs. Our method demonstrates superior training efficiency with faster convergence, and best inference efficiency without requiring additional NFEs. Extensive experiments show that Diffusion-Sharpening outperforms RL-based fine-tuning methods (e.g., Diffusion-DPO) and sampling trajectory optimization methods (e.g., Inference Scaling) across diverse metrics including text alignment, compositional capabilities, and human preferences, offering a scalable and efficient solution for future diffusion model fine-tuning. Code: https://github.com/Gen-Verse/Diffusion-Sharpening