LGMar 21, 2022

Self-Imitation Learning from Demonstrations

arXiv:2203.10905v18 citationsh-index: 10
Originality Incremental advance
AI Analysis

This addresses the problem of sparse rewards and suboptimal demonstrations for reinforcement learning practitioners, offering an incremental improvement over existing methods.

The paper tackles the challenge of learning from suboptimal demonstrations in sparse-reward reinforcement learning by extending Self-Imitation Learning to incorporate demonstrations, resulting in SILfD, which outperforms state-of-the-art methods, especially with highly suboptimal demonstrations.

Despite the numerous breakthroughs achieved with Reinforcement Learning (RL), solving environments with sparse rewards remains a challenging task that requires sophisticated exploration. Learning from Demonstrations (LfD) remedies this issue by guiding the agent's exploration towards states experienced by an expert. Naturally, the benefits of this approach hinge on the quality of demonstrations, which are rarely optimal in realistic scenarios. Modern LfD algorithms require meticulous tuning of hyperparameters that control the influence of demonstrations and, as we show in the paper, struggle with learning from suboptimal demonstrations. To address these issues, we extend Self-Imitation Learning (SIL), a recent RL algorithm that exploits the agent's past good experience, to the LfD setup by initializing its replay buffer with demonstrations. We denote our algorithm as SIL from Demonstrations (SILfD). We empirically show that SILfD can learn from demonstrations that are noisy or far from optimal and can automatically adjust the influence of demonstrations throughout the training without additional hyperparameters or handcrafted schedules. We also find SILfD superior to the existing state-of-the-art LfD algorithms in sparse environments, especially when demonstrations are highly suboptimal.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes