LGAIMLDec 3, 2018

Generative Adversarial Self-Imitation Learning

arXiv:1812.00950v161 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of long-term credit assignment in reinforcement learning for environments with delayed rewards, though it appears incremental as it builds on existing imitation learning and policy gradient methods.

The paper tackles the problem of sparse and delayed rewards in reinforcement learning by proposing Generative Adversarial Self-Imitation Learning (GASIL), which encourages agents to imitate past good trajectories, resulting in improved performance on 2D Point Mass and MuJoCo environments when combined with proximal policy optimization.

This paper explores a simple regularizer for reinforcement learning by proposing Generative Adversarial Self-Imitation Learning (GASIL), which encourages the agent to imitate past good trajectories via generative adversarial imitation learning framework. Instead of directly maximizing rewards, GASIL focuses on reproducing past good trajectories, which can potentially make long-term credit assignment easier when rewards are sparse and delayed. GASIL can be easily combined with any policy gradient objective by using GASIL as a learned shaped reward function. Our experimental results show that GASIL improves the performance of proximal policy optimization on 2D Point Mass and MuJoCo environments with delayed reward and stochastic dynamics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes