OC LGMar 28

Adjoint Matching through the Lens of the Stochastic Maximum Principle in Optimal Control

arXiv:2604.0858089.71 citationsh-index: 10

AI Analysis

For researchers working on reward fine-tuning of generative models and sampling from tilted distributions, this work provides a theoretically grounded and practical method for solving stochastic optimal control problems, though it is an incremental theoretical contribution that generalizes existing work.

The paper places Adjoint Matching, a method for learning optimal controls in diffusion and flow models, on a rigorous theoretical foundation by deriving it from the Stochastic Maximum Principle (SMP). It shows that the adjoint matching loss has the same first variation as the original stochastic optimal control objective, and that its critical points satisfy HJB stationarity conditions, providing a practical alternative to classical SMP-based algorithms.

Reward fine-tuning of diffusion and flow models and sampling from tilted or Boltzmann distributions can both be formulated as stochastic optimal control (SOC) problems, where learning an optimal generative dynamics corresponds to optimizing a control under SDE constraints. In this work, we revisit and generalize Adjoint Matching, a recently proposed SOC-based method for learning optimal controls, and place it on a rigorous footing by deriving it from the Stochastic Maximum Principle (SMP). We formulate a general Hamiltonian adjoint matching objective for SOC problems with control-dependent drift and diffusion and convex running costs, and show that its expected value has the same first variation as the original SOC objective. As a consequence, critical points satisfy the Hamilton--Jacobi--Bellman (HJB) stationarity conditions. In the important practical case of state- and control-independent diffusion, we recover the lean adjoint matching loss previously introduced in adjoint matching, which avoids second-order terms and whose critical points coincide with the optimal control under mild uniqueness assumptions. Finally, we show that adjoint matching can be precisely interpreted as a continuous-time method of successive approximations induced by the SMP, yielding a practical and implementable alternative to classical SMP-based algorithms, which are obstructed by intractable martingale terms in the stochastic setting. These results are also of independent interest to the stochastic control community, providing new implementable objectives and a viable pathway for SMP-based iterations in stochastic problems.

View on arXiv PDF

Similar