LGAISep 12, 2024

Scores as Actions: a framework of fine-tuning diffusion models by continuous-time reinforcement learning

arXiv:2409.08400v118 citationsh-index: 8
AI Analysis

This provides a theoretical framework for enhancing text-to-image generation models, though it appears incremental as it builds on existing RLHF approaches for diffusion models.

The paper tackles the problem of aligning diffusion models with human intent by formulating fine-tuning as a continuous-time stochastic control problem, treating score-matching functions as actions to improve generation quality using reinforcement learning.

Reinforcement Learning from human feedback (RLHF) has been shown a promising direction for aligning generative models with human intent and has also been explored in recent works for alignment of diffusion generative models. In this work, we provide a rigorous treatment by formulating the task of fine-tuning diffusion models, with reward functions learned from human feedback, as an exploratory continuous-time stochastic control problem. Our key idea lies in treating the score-matching functions as controls/actions, and upon this, we develop a unified framework from a continuous-time perspective, to employ reinforcement learning (RL) algorithms in terms of improving the generation quality of diffusion models. We also develop the corresponding continuous-time RL theory for policy optimization and regularization under assumptions of stochastic different equations driven environment. Experiments on the text-to-image (T2I) generation will be reported in the accompanied paper.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes