Is RL fine-tuning harder than regression? A PDE learning approach for diffusion models
This work addresses the challenge of efficient fine-tuning in diffusion models for machine learning practitioners, offering a method that reduces complexity to supervised regression, though it appears incremental as it builds on existing PDE and control theory frameworks.
The paper tackles the problem of learning optimal control policies for fine-tuning diffusion processes by developing algorithms based on solving variational inequalities from Hamilton-Jacobi-Bellman equations, proving sharp statistical rates for value function and policy learning. It shows that fine-tuning can be achieved via supervised regression with faster statistical guarantees compared to generic reinforcement learning.
We study the problem of learning the optimal control policy for fine-tuning a given diffusion process, using general value function approximation. We develop a new class of algorithms by solving a variational inequality problem based on the Hamilton-Jacobi-Bellman (HJB) equations. We prove sharp statistical rates for the learned value function and control policy, depending on the complexity and approximation errors of the function class. In contrast to generic reinforcement learning problems, our approach shows that fine-tuning can be achieved via supervised regression, with faster statistical rate guarantees.