LGAIMLMay 9, 2019

Pretrain Soft Q-Learning with Imperfect Demonstrations

arXiv:1905.03501v11 citations
Originality Incremental advance
AI Analysis

This addresses the problem of reducing computational costs in reinforcement learning for researchers and practitioners, though it is incremental as it adapts existing pretraining concepts to a specific algorithm.

The paper tackles the challenge of pretraining reinforcement learning with imperfect demonstrations, proposing a method for soft Q-learning that effectively learns from such data and outperforms state-of-the-art methods on Atari 2600 tasks.

Pretraining reinforcement learning methods with demonstrations has been an important concept in the study of reinforcement learning since a large amount of computing power is spent on online simulations with existing reinforcement learning algorithms. Pretraining reinforcement learning remains a significant challenge in exploiting expert demonstrations whilst keeping exploration potentials, especially for value based methods. In this paper, we propose a pretraining method for soft Q-learning. Our work is inspired by pretraining methods for actor-critic algorithms since soft Q-learning is a value based algorithm that is equivalent to policy gradient. The proposed method is based on $γ$-discounted biased policy evaluation with entropy regularization, which is also the updating target of soft Q-learning. Our method is evaluated on various tasks from Atari 2600. Experiments show that our method effectively learns from imperfect demonstrations, and outperforms other state-of-the-art methods that learn from expert demonstrations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes