OC LG PRJun 10, 2020

Gradient Flows for Regularized Stochastic Control Problems

arXiv:2006.05956v511.922 citations

Originality Highly original

AI Analysis

This provides theoretical underpinning for convergence of stochastic gradient algorithms in reinforcement learning, addressing a foundational problem for the ML/AI community.

The paper tackles stochastic control problems with probability measure action spaces and relative entropy penalties by constructing a gradient flow on a metric space to decrease the cost functional, showing that invariant measures satisfy the Pontryagin principle and achieving exponential convergence under convexity conditions.

This paper studies stochastic control problems with the action space taken to be probability measures, with the objective penalised by the relative entropy. We identify suitable metric space on which we construct a gradient flow for the measure-valued control process, in the set of admissible controls, along which the cost functional is guaranteed to decrease. It is shown that any invariant measure of this gradient flow satisfies the Pontryagin optimality principle. If the problem we work with is sufficiently convex, the gradient flow converges exponentially fast. Furthermore, the optimal measure-valued control process admits a Bayesian interpretation which means that one can incorporate prior knowledge when solving such stochastic control problems. This work is motivated by a desire to extend the theoretical underpinning for the convergence of stochastic gradient type algorithms widely employed in the reinforcement learning community to solve control problems.

View on arXiv PDF

Similar