ROFeb 27, 2020

Exploration-efficient Deep Reinforcement Learning with Demonstration Guidance for Robot Control

arXiv:2002.12089v18 citations
AI Analysis

This work addresses sample inefficiency in robot control for researchers and practitioners, but it is incremental as it builds on existing SAC and RLfD methods.

The paper tackles the problem of sample inefficiency and unstable training in deep reinforcement learning for robot control by proposing DRL-EG, a method that uses a small number of expert demonstrations to guide exploration, resulting in improved performance over other RL and RLfD methods and helping agents escape local optima.

Although deep reinforcement learning (DRL) algorithms have made important achievements in many control tasks, they still suffer from the problems of sample inefficiency and unstable training process, which are usually caused by sparse rewards. Recently, some reinforcement learning from demonstration (RLfD) methods have shown to be promising in overcoming these problems. However, they usually require considerable demonstrations. In order to tackle these challenges, on the basis of the SAC algorithm we propose a sample efficient DRL-EG (DRL with efficient guidance) algorithm, in which a discriminator D(s) and a guider G(s) are modeled by a small number of expert demonstrations. The discriminator will determine the appropriate guidance states and the guider will guide agents to better exploration in the training phase. Empirical evaluation results from several continuous control tasks verify the effectiveness and performance improvements of our method over other RL and RLfD counterparts. Experiments results also show that DRL-EG can help the agent to escape from a local optimum.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes