Enhancing Policy Gradient with the Polyak Step-Size Adaption
This work addresses a practical bottleneck in reinforcement learning for researchers and practitioners, but it is incremental as it adapts an existing optimization method to a specific domain.
The paper tackles the sensitivity of policy gradient methods to hyper-parameters, particularly step-size, by integrating the Polyak step-size adaptation into reinforcement learning, resulting in faster convergence and more stable policies.
Policy gradient is a widely utilized and foundational algorithm in the field of reinforcement learning (RL). Renowned for its convergence guarantees and stability compared to other RL algorithms, its practical application is often hindered by sensitivity to hyper-parameters, particularly the step-size. In this paper, we introduce the integration of the Polyak step-size in RL, which automatically adjusts the step-size without prior knowledge. To adapt this method to RL settings, we address several issues, including unknown f* in the Polyak step-size. Additionally, we showcase the performance of the Polyak step-size in RL through experiments, demonstrating faster convergence and the attainment of more stable policies.