MLLGOCSTJan 30, 2023

A Novel Framework for Policy Mirror Descent with General Parameterization and Linear Convergence

arXiv:2301.13139v426 citationsh-index: 14
Originality Highly original
AI Analysis

This work addresses a theoretical gap in reinforcement learning for researchers and practitioners, though it is incremental as it builds upon existing policy optimization methods.

The authors tackled the lack of theoretical guarantees for policy optimization methods with general parameterizations in reinforcement learning by introducing a novel mirror descent framework, achieving the first linear convergence result for such methods and demonstrating improved sample complexity with shallow neural networks on control tasks.

Modern policy optimization methods in reinforcement learning, such as TRPO and PPO, owe their success to the use of parameterized policies. However, while theoretical guarantees have been established for this class of algorithms, especially in the tabular setting, the use of general parameterization schemes remains mostly unjustified. In this work, we introduce a novel framework for policy optimization based on mirror descent that naturally accommodates general parameterizations. The policy class induced by our scheme recovers known classes, e.g., softmax, and generates new ones depending on the choice of mirror map. Using our framework, we obtain the first result that guarantees linear convergence for a policy-gradient-based method involving general parameterization. To demonstrate the ability of our framework to accommodate general parameterization schemes, we provide its sample complexity when using shallow neural networks, show that it represents an improvement upon the previous best results, and empirically validate the effectiveness of our theoretical claims on classic control tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes