Hindsight Experience Replay Accelerates Proximal Policy Optimization
This addresses the challenge of slow learning in on-policy RL for sparse-reward tasks, though it is incremental as it adapts an existing technique to a new algorithm.
The paper tackled the problem of accelerating on-policy reinforcement learning in sparse-reward environments by applying hindsight experience replay (HER) to proximal policy optimization (PPO), resulting in dramatic acceleration in a custom predator-prey environment.
Hindsight experience replay (HER) accelerates off-policy reinforcement learning algorithms for environments that emit sparse rewards by modifying the goal of the episode post-hoc to be some state achieved during the episode. Because post-hoc modification of the observed goal violates the assumptions of on-policy algorithms, HER is not typically applied to on-policy algorithms. Here, we show that HER can dramatically accelerate proximal policy optimization (PPO), an on-policy reinforcement learning algorithm, when tested on a custom predator-prey environment.