LGMay 8, 2024

Fast Stochastic Policy Gradient: Negative Momentum for Reinforcement Learning

arXiv:2405.12228v1
Originality Incremental advance
AI Analysis

This work addresses the problem of accelerating optimization in reinforcement learning for practitioners, but it is incremental as it builds on existing SPG methods with a modified momentum technique.

The paper tackles the challenge of slow convergence in stochastic policy gradient (SPG) methods for reinforcement learning by developing SPG-NM, a fast algorithm that incorporates a novel negative momentum technique. Numerical results on bandit and MDP tasks show faster convergence rates compared to state-of-the-art algorithms, confirming the effectiveness of the approach.

Stochastic optimization algorithms, particularly stochastic policy gradient (SPG), report significant success in reinforcement learning (RL). Nevertheless, up to now, that how to speedily acquire an optimal solution for RL is still a challenge. To tackle this issue, this work develops a fast SPG algorithm from the perspective of utilizing a momentum, coined SPG-NM. Specifically, in SPG-NM, a novel type of the negative momentum (NM) technique is applied into the classical SPG algorithm. Different from the existing NM techniques, we have adopted a few hyper-parameters in our SPG-NM algorithm. Moreover, the computational complexity is nearly same as the modern SPG-type algorithms, e.g., accelerated policy gradient (APG), which equips SPG with Nesterov's accelerated gradient (NAG). We evaluate the resulting algorithm on two classical tasks, bandit setting and Markov decision process (MDP). Numerical results in different tasks demonstrate faster convergence rate of the resulting algorithm by comparing state-of-the-art algorithms, which confirm the positive impact of NM in accelerating SPG for RL. Also, numerical experiments under different settings confirm the robustness of our SPG-NM algorithm for some certain crucial hyper-parameters, which ride the user feel free in practice.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes