Dynamic Experience Replay
This work addresses training inefficiencies in reinforcement learning for robotic manipulation, offering a domain-specific improvement that is incremental but impactful for such applications.
The paper tackles the problem of improving training efficiency in reinforcement learning for robotic assembly tasks by introducing Dynamic Experience Replay, which incorporates both human demonstrations and successful agent-generated transitions into the replay buffer, resulting in significantly shortened training times or enabling tasks that baseline methods could not solve.
We present a novel technique called Dynamic Experience Replay (DER) that allows Reinforcement Learning (RL) algorithms to use experience replay samples not only from human demonstrations but also successful transitions generated by RL agents during training and therefore improve training efficiency. It can be combined with an arbitrary off-policy RL algorithm, such as DDPG or DQN, and their distributed versions. We build upon Ape-X DDPG and demonstrate our approach on robotic tight-fitting joint assembly tasks, based on force/torque and Cartesian pose observations. In particular, we run experiments on two different tasks: peg-in-hole and lap-joint. In each case, we compare different replay buffer structures and how DER affects them. Our ablation studies show that Dynamic Experience Replay is a crucial ingredient that either largely shortens the training time in these challenging environments or solves the tasks that the vanilla Ape-X DDPG cannot solve. We also show that our policies learned purely in simulation can be deployed successfully on the real robot. The video presenting our experiments is available at https://sites.google.com/site/dynamicexperiencereplay