Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines
This work provides an incremental improvement for researchers and practitioners in reinforcement learning by refining baseline techniques to enhance training efficiency.
The paper tackles the problem of reducing variance in policy gradient methods for reinforcement learning by introducing action-dependent baselines, showing that this approach can achieve lower variance compared to action-independent baselines without introducing bias.
We show how an action-dependent baseline can be used by the policy gradient theorem using function approximation, originally presented with action-independent baselines by (Sutton et al. 2000).