Mixing Human Demonstrations with Self-Exploration in Experience Replay for Deep Reinforcement Learning
This addresses the problem of improving training efficiency in reinforcement learning for tasks like goal-reaching with obstacles, but it is incremental as it builds on existing methods.
The study examined the impact of incorporating human demonstrations into the replay buffer for Deep Reinforcement Learning, finding that while agents trained with pure self-exploration and pure demonstration achieved similar success rates, the pure demonstration model converged faster with fewer steps.
We investigate the effect of using human demonstration data in the replay buffer for Deep Reinforcement Learning. We use a policy gradient method with a modified experience replay buffer where a human demonstration experience is sampled with a given probability. We analyze different ratios of using demonstration data in a task where an agent attempts to reach a goal while avoiding obstacles. Our results suggest that while the agents trained by pure self-exploration and pure demonstration had similar success rates, the pure demonstration model converged faster to solutions with less number of steps.