OC RO SYMar 26, 2021

Value Function Estimators for Feynman-Kac Forward-Backward SDEs in Stochastic Optimal Control

Kelsey P. Hawkins, Ali Pakniyat, Panagiotis Tsiotras

arXiv:2103.14246v24.07 citationsh-index: 56

Originality Incremental advance

AI Analysis

This work addresses a computational bottleneck in stochastic optimal control and reinforcement learning, offering incremental improvements in estimator accuracy for specific domains.

The authors tackled the problem of solving forward-backward stochastic differential equations (FBSDEs) in stochastic optimal control by proposing two novel numerical estimators that derive discrete-time approximations from the value function, showing significant accuracy improvements over existing methods, with near machine-precision level accuracy in linear quadratic regulator problems.

Two novel numerical estimators are proposed for solving forward-backward stochastic differential equations (FBSDEs) appearing in the Feynman-Kac representation of the value function in stochastic optimal control problems. In contrast to the current numerical approaches which are based on the discretization of the continuous-time FBSDE, we propose a converse approach, namely, we obtain a discrete-time approximation of the on-policy value function, and then we derive a discrete-time estimator that resembles the continuous-time counterpart. The proposed approach allows for the construction of higher accuracy estimators along with error analysis. The approach is applied to the policy improvement step in reinforcement learning. Numerical results and error analysis are demonstrated using (i) a scalar nonlinear stochastic optimal control problem and (ii) a four-dimensional linear quadratic regulator (LQR) problem. The proposed estimators show significant improvement in terms of accuracy in both cases over Euler-Maruyama-based estimators used in competing approaches. In the case of LQR problems, we demonstrate that our estimators result in near machine-precision level accuracy, in contrast to previously proposed methods that can potentially diverge on the same problems.

View on arXiv PDF

Similar