LGAIMLJun 14, 2018

Self-Imitation Learning

arXiv:1806.05635v1296 citations
Originality Incremental advance
AI Analysis

This addresses exploration challenges in reinforcement learning for agents, but it is incremental as it builds on existing actor-critic methods.

The paper tackles the problem of exploration in reinforcement learning by proposing Self-Imitation Learning (SIL), which learns to reproduce past good decisions, and shows that it significantly improves A2C on hard exploration Atari games and is competitive with state-of-the-art methods.

This paper proposes Self-Imitation Learning (SIL), a simple off-policy actor-critic algorithm that learns to reproduce the agent's past good decisions. This algorithm is designed to verify our hypothesis that exploiting past good experiences can indirectly drive deep exploration. Our empirical results show that SIL significantly improves advantage actor-critic (A2C) on several hard exploration Atari games and is competitive to the state-of-the-art count-based exploration methods. We also show that SIL improves proximal policy optimization (PPO) on MuJoCo tasks.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes