Model-free Reinforcement Learning for Robust Locomotion using Demonstrations from Trajectory Optimization
This work addresses robust locomotion for real robots, but it is incremental as it builds on existing RL and trajectory optimization methods.
The authors tackled the problem of creating robust locomotion policies for real robots by using a two-stage reinforcement learning approach that starts with a single trajectory optimization demonstration. They demonstrated the method on quadruped robot hopping and bounding tasks, achieving robust performance without additional real-world training.
We present a general, two-stage reinforcement learning approach to create robust policies that can be deployed on real robots without any additional training using a single demonstration generated by trajectory optimization. The demonstration is used in the first stage as a starting point to facilitate initial exploration. In the second stage, the relevant task reward is optimized directly and a policy robust to environment uncertainties is computed. We demonstrate and examine in detail the performance and robustness of our approach on highly dynamic hopping and bounding tasks on a quadruped robot.