AIJul 14, 2021

Mixing Human Demonstrations with Self-Exploration in Experience Replay for Deep Reinforcement Learning

arXiv:2107.06840v1
Originality Synthesis-oriented
AI Analysis

This addresses the problem of improving training efficiency in reinforcement learning for tasks like goal-reaching with obstacles, but it is incremental as it builds on existing methods.

The study examined the impact of incorporating human demonstrations into the replay buffer for Deep Reinforcement Learning, finding that while agents trained with pure self-exploration and pure demonstration achieved similar success rates, the pure demonstration model converged faster with fewer steps.

We investigate the effect of using human demonstration data in the replay buffer for Deep Reinforcement Learning. We use a policy gradient method with a modified experience replay buffer where a human demonstration experience is sampled with a given probability. We analyze different ratios of using demonstration data in a task where an agent attempts to reach a goal while avoiding obstacles. Our results suggest that while the agents trained by pure self-exploration and pure demonstration had similar success rates, the pure demonstration model converged faster to solutions with less number of steps.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes