LGSYNov 21, 2020

On the Convergence of Reinforcement Learning in Nonlinear Continuous State Space Problems

arXiv:2011.10829v2
AI Analysis

This paper addresses a fundamental theoretical limitation in Reinforcement Learning for researchers and practitioners working with nonlinear continuous systems, providing insights into why global solutions are difficult to achieve and suggesting a path towards accurate local solutions.

This paper investigates the convergence of Reinforcement Learning (RL) in nonlinear continuous state space problems, identifying a "Curse of Variance" where solution variance grows factorial-exponentially with approximation order. This inherent issue limits RL to finding only "local" feedback solutions to manage variance and ensure accuracy. The authors also show that deterministic optimal control has a perturbation structure, allowing higher-order terms to not affect lower-order calculations, which can be leveraged in RL for accurate local solutions.

We consider the problem of Reinforcement Learning for nonlinear stochastic dynamical systems. We show that in the RL setting, there is an inherent ``Curse of Variance" in addition to Bellman's infamous ``Curse of Dimensionality", in particular, we show that the variance in the solution grows factorial-exponentially in the order of the approximation. A fundamental consequence is that this precludes the search for anything other than ``local" feedback solutions in RL, in order to control the explosive variance growth, and thus, ensure accuracy. We further show that the deterministic optimal control has a perturbation structure, in that the higher order terms do not affect the calculation of lower order terms, which can be utilized in RL to get accurate local solutions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes