LGAISYOCOct 8, 2023

The Reinforce Policy Gradient Algorithm Revisited

arXiv:2310.05000v14 citationsh-index: 3
Originality Incremental advance
AI Analysis

This work addresses a theoretical limitation in reinforcement learning for continuous or large-scale systems, though it appears incremental as it builds directly on the existing Reinforce algorithm.

The authors tackled the challenge of applying the Reinforce policy gradient algorithm to systems with infinite state and action spaces by proposing an enhancement that uses random search to estimate the policy gradient, which relaxes regularity requirements for convergence. They proved that this modified algorithm converges to a neighborhood of a local minimum.

We revisit the Reinforce policy gradient algorithm from the literature. Note that this algorithm typically works with cost returns obtained over random length episodes obtained from either termination upon reaching a goal state (as with episodic tasks) or from instants of visit to a prescribed recurrent state (in the case of continuing tasks). We propose a major enhancement to the basic algorithm. We estimate the policy gradient using a function measurement over a perturbed parameter by appealing to a class of random search approaches. This has advantages in the case of systems with infinite state and action spaces as it relax some of the regularity requirements that would otherwise be needed for proving convergence of the Reinforce algorithm. Nonetheless, we observe that even though we estimate the gradient of the performance objective using the performance objective itself (and not via the sample gradient), the algorithm converges to a neighborhood of a local minimum. We also provide a proof of convergence for this new algorithm.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes