LG AI OCFeb 3, 2025

Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning

Hanyang Zhao, Haoxian Chen, Ji Zhang, David D. Yao, Wenpin Tang

arXiv:2502.01819v328.529 citationsh-index: 8ICML

Originality Incremental advance

AI Analysis

This work addresses the challenge of aligning diffusion models with human feedback for reliable generative AI, offering a novel method that is incremental but applicable to models with complex solvers.

The paper tackles the problem of fine-tuning diffusion generative models by developing a continuous-time reinforcement learning approach to reduce discretization errors and improve alignment with input prompts, achieving enhanced performance in downstream tasks with Stable Diffusion v1.5.

Reinforcement learning from human feedback (RLHF), which aligns a diffusion model with input prompt, has become a crucial step in building reliable generative AI models. Most works in this area use a discrete-time formulation, which is prone to induced discretization errors, and often not applicable to models with higher-order/black-box solvers. The objective of this study is to develop a disciplined approach to fine-tune diffusion models using continuous-time RL, formulated as a stochastic control problem with a reward function that aligns the end result (terminal state) with input prompt. The key idea is to treat score matching as controls or actions, and thereby making connections to policy optimization and regularization in continuous-time RL. To carry out this idea, we lay out a new policy optimization framework for continuous-time RL, and illustrate its potential in enhancing the value networks design space via leveraging the structural property of diffusion models. We validate the advantages of our method by experiments in downstream tasks of fine-tuning large-scale Text2Image models of Stable Diffusion v1.5.

View on arXiv PDF

Similar