Learning Memory-Dependent Continuous Control from Demonstrations
This work addresses the problem of memory-dependent continuous control for robotics, offering an incremental improvement over existing methods by extending them to partially observable settings.
The paper tackles the challenge of efficient exploration in reinforcement learning for partially observable environments by proposing READER, a novel algorithm that learns from demonstrations and self-exploration, resulting in significantly reduced environment interactions and better sample efficiency compared to baseline methods.
Efficient exploration has presented a long-standing challenge in reinforcement learning, especially when rewards are sparse. A developmental system can overcome this difficulty by learning from both demonstrations and self-exploration. However, existing methods are not applicable to most real-world robotic controlling problems because they assume that environments follow Markov decision processes (MDP); thus, they do not extend to partially observable environments where historical observations are necessary for decision making. This paper builds on the idea of replaying demonstrations for memory-dependent continuous control, by proposing a novel algorithm, Recurrent Actor-Critic with Demonstration and Experience Replay (READER). Experiments involving several memory-crucial continuous control tasks reveal significantly reduce interactions with the environment using our method with a reasonably small number of demonstration samples. The algorithm also shows better sample efficiency and learning capabilities than a baseline reinforcement learning algorithm for memory-based control from demonstrations.