LGMLJun 12, 2020

Self-Imitation Learning via Generalized Lower Bound Q-learning

arXiv:2006.07442v330 citations
Originality Incremental advance
AI Analysis

This work addresses robustness in reinforcement learning for continuous control tasks, offering incremental improvements over prior self-imitation and n-step methods.

The paper tackles the problem of off-policy learning by proposing a generalized n-step lower bound Q-learning method, which improves robustness and performance across continuous control benchmarks compared to existing approaches.

Self-imitation learning motivated by lower-bound Q-learning is a novel and effective approach for off-policy learning. In this work, we propose a n-step lower bound which generalizes the original return-based lower-bound Q-learning, and introduce a new family of self-imitation learning algorithms. To provide a formal motivation for the potential performance gains provided by self-imitation learning, we show that n-step lower bound Q-learning achieves a trade-off between fixed point bias and contraction rate, drawing close connections to the popular uncorrected n-step Q-learning. We finally show that n-step lower bound Q-learning is a more robust alternative to return-based self-imitation learning and uncorrected n-step, over a wide range of continuous control benchmark tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes