LG OCJun 4, 2025

PPO in the Fisher-Rao geometry

Razvan-Andrei Lascu, David Šiška, Łukasz Szpruch

arXiv:2506.03757v113.03 citationsh-index: 13

Originality Highly original

AI Analysis

This addresses a foundational theoretical gap for reinforcement learning practitioners, offering formal convergence results for PPO-based algorithms.

The paper tackled the lack of theoretical guarantees in Proximal Policy Optimization (PPO) by deriving a tighter surrogate in the Fisher-Rao geometry, resulting in FR-PPO, which provides monotonic policy improvement and achieves sub-linear convergence in tabular settings without dependence on state or action space dimensionality.

Proximal Policy Optimization (PPO) has become a widely adopted algorithm for reinforcement learning, offering a practical policy gradient method with strong empirical performance. Despite its popularity, PPO lacks formal theoretical guarantees for policy improvement and convergence. PPO is motivated by Trust Region Policy Optimization (TRPO) that utilizes a surrogate loss with a KL divergence penalty, which arises from linearizing the value function within a flat geometric space. In this paper, we derive a tighter surrogate in the Fisher-Rao (FR) geometry, yielding a novel variant, Fisher-Rao PPO (FR-PPO). Our proposed scheme provides strong theoretical guarantees, including monotonic policy improvement. Furthermore, in the tabular setting, we demonstrate that FR-PPO achieves sub-linear convergence without any dependence on the dimensionality of the action or state spaces, marking a significant step toward establishing formal convergence results for PPO-based algorithms.

View on arXiv PDF

Similar