Riemannian Proximal Policy Optimization
This work addresses policy optimization in reinforcement learning for researchers, offering a novel approach with theoretical guarantees, though it appears incremental as it builds on existing proximal and Riemannian methods.
The paper tackled the problem of solving Markov decision processes by proposing a Riemannian proximal optimization algorithm that models policy functions with Gaussian mixture models in a Riemannian space, achieving guaranteed convergence and demonstrating efficacy in preliminary experiments.
In this paper, We propose a general Riemannian proximal optimization algorithm with guaranteed convergence to solve Markov decision process (MDP) problems. To model policy functions in MDP, we employ Gaussian mixture model (GMM) and formulate it as a nonconvex optimization problem in the Riemannian space of positive semidefinite matrices. For two given policy functions, we also provide its lower bound on policy improvement by using bounds derived from the Wasserstein distance of GMMs. Preliminary experiments show the efficacy of our proposed Riemannian proximal policy optimization algorithm.