Hierarchical Reinforcement Learning and Value Optimization for Challenging Quadruped Locomotion
This work addresses locomotion challenges for quadruped robots, particularly on difficult terrains, representing an incremental improvement over existing methods.
The authors tackled the problem of quadruped locomotion over challenging terrain by proposing a hierarchical reinforcement learning framework with a high-level policy that selects goals for a low-level policy via online optimization, resulting in higher rewards and fewer collisions compared to an end-to-end RL approach.
We propose a novel hierarchical reinforcement learning framework for quadruped locomotion over challenging terrain. Our approach incorporates a two-layer hierarchy in which a high-level policy (HLP) selects optimal goals for a low-level policy (LLP). The LLP is trained using an on-policy actor-critic RL algorithm and is given footstep placements as goals. We propose an HLP that does not require any additional training or environment samples and instead operates via an online optimization process over the learned value function of the LLP. We demonstrate the benefits of this framework by comparing it with an end-to-end reinforcement learning (RL) approach. We observe improvements in its ability to achieve higher rewards with fewer collisions across an array of different terrains, including terrains more difficult than any encountered during training.