Global Convergence Using Policy Gradient Methods for Model-free Markovian Jump Linear Quadratic Control
This work addresses a foundational control problem for reinforcement learning practitioners, but it is incremental as it extends existing convergence proofs to a more complex setting.
The paper tackles the problem of proving global convergence for gradient-based policy optimization methods in model-free Markovian jump linear quadratic control, overcoming challenges from multiple states and unknown dynamics, and demonstrates convergence using gradient descent and natural policy gradient methods with simulation validation.
Owing to the growth of interest in Reinforcement Learning in the last few years, gradient based policy control methods have been gaining popularity for Control problems as well. And rightly so, since gradient policy methods have the advantage of optimizing a metric of interest in an end-to-end manner, along with being relatively easy to implement without complete knowledge of the underlying system. In this paper, we study the global convergence of gradient-based policy optimization methods for quadratic control of discrete-time and model-free Markovian jump linear systems (MJLS). We surmount myriad challenges that arise because of more than one states coupled with lack of knowledge of the system dynamics and show global convergence of the policy using gradient descent and natural policy gradient methods. We also provide simulation studies to corroborate our claims.