OC LG SY NA MLMar 28, 2024

Fisher-Rao Gradient Flows of Linear Programs and State-Action Natural Policy Gradients

Johannes Müller, Semih Çaycı, Guido Montúfar

arXiv:2403.19448v29.45 citationsh-index: 11

Originality Incremental advance

AI Analysis

This work provides incremental theoretical insights for researchers in optimization and reinforcement learning, focusing on specific gradient methods with limited prior analysis.

The paper tackles the theoretical analysis of Fisher-Rao gradient flows for linear programs and state-action natural policy gradients, showing linear convergence rates dependent on problem geometry and improving error bounds for entropic regularization.

Kakade's natural policy gradient method has been studied extensively in recent years, showing linear convergence with and without regularization. We study another natural gradient method based on the Fisher information matrix of the state-action distributions which has received little attention from the theoretical side. Here, the state-action distributions follow the Fisher-Rao gradient flow inside the state-action polytope with respect to a linear potential. Therefore, we study Fisher-Rao gradient flows of linear programs more generally and show linear convergence with a rate that depends on the geometry of the linear program. Equivalently, this yields an estimate on the error induced by entropic regularization of the linear program which improves existing results. We extend these results and show sublinear convergence for perturbed Fisher-Rao gradient flows and natural gradient flows up to an approximation error. In particular, these general results cover the case of state-action natural policy gradients.

View on arXiv PDF

Similar