Reinforcement Learning Experiments and Benchmark for Solving Robotic Reaching Tasks
This work provides a systematic benchmark for robotic reaching, which is incremental as it builds on existing methods to improve reproducibility and performance in a specific domain.
The paper tackles the problem of comparing model-free reinforcement learning algorithms for robotic reaching tasks by defining a robust experimental procedure, and finds that augmenting the reward signal with Hindsight Experience Replay increases the average return of off-policy agents by 7 to 9 times when targets are randomly initialized.
Reinforcement learning has shown great promise in robotics thanks to its ability to develop efficient robotic control procedures through self-training. In particular, reinforcement learning has been successfully applied to solving the reaching task with robotic arms. In this paper, we define a robust, reproducible and systematic experimental procedure to compare the performance of various model-free algorithms at solving this task. The policies are trained in simulation and are then transferred to a physical robotic manipulator. It is shown that augmenting the reward signal with the Hindsight Experience Replay exploration technique increases the average return of off-policy agents between 7 and 9 folds when the target position is initialised randomly at the beginning of each episode.