CV LGSep 29, 2023

Directly Fine-Tuning Diffusion Models on Differentiable Rewards

Kevin Clark, Paul Vicol, Kevin Swersky, David J Fleet

arXiv:2309.17400v244.2420 citationsh-index: 79

Originality Incremental advance

AI Analysis

This work addresses the challenge of aligning diffusion models with specific objectives for researchers and practitioners in generative AI, though it is incremental as it builds on existing gradient-based fine-tuning methods.

The authors tackled the problem of fine-tuning diffusion models to maximize differentiable reward functions, such as human preference scores, by introducing Direct Reward Fine-Tuning (DRaFT) and its efficient variants, which outperformed reinforcement learning-based approaches and improved the aesthetic quality of images generated by Stable Diffusion 1.4.

We present Direct Reward Fine-Tuning (DRaFT), a simple and effective method for fine-tuning diffusion models to maximize differentiable reward functions, such as scores from human preference models. We first show that it is possible to backpropagate the reward function gradient through the full sampling procedure, and that doing so achieves strong performance on a variety of rewards, outperforming reinforcement learning-based approaches. We then propose more efficient variants of DRaFT: DRaFT-K, which truncates backpropagation to only the last K steps of sampling, and DRaFT-LV, which obtains lower-variance gradient estimates for the case when K=1. We show that our methods work well for a variety of reward functions and can be used to substantially improve the aesthetic quality of images generated by Stable Diffusion 1.4. Finally, we draw connections between our approach and prior work, providing a unifying perspective on the design space of gradient-based fine-tuning algorithms.

View on arXiv PDF

Similar