The Definitive Guide to Policy Gradients in Deep Reinforcement Learning: Theory, Algorithms and Implementations
This work serves as a tutorial for researchers and practitioners in reinforcement learning, offering a consolidated resource for understanding and implementing policy gradient methods, but it is incremental as it synthesizes existing knowledge without introducing new algorithms.
The paper provides a comprehensive overview of on-policy policy gradient algorithms in deep reinforcement learning, comparing prominent methods on continuous control environments and offering insights into regularization benefits, with all code made publicly available.
In recent years, various powerful policy gradient algorithms have been proposed in deep reinforcement learning. While all these algorithms build on the Policy Gradient Theorem, the specific design choices differ significantly across algorithms. We provide a holistic overview of on-policy policy gradient algorithms to facilitate the understanding of both their theoretical foundations and their practical implementations. In this overview, we include a detailed proof of the continuous version of the Policy Gradient Theorem, convergence results and a comprehensive discussion of practical algorithms. We compare the most prominent algorithms on continuous control environments and provide insights on the benefits of regularization. All code is available at https://github.com/Matt00n/PolicyGradientsJax.