ROLGSYMar 11, 2021

Robust High-speed Running for Quadruped Robots via Deep Reinforcement Learning

arXiv:2103.06484v268 citations
AI Analysis

This work addresses the challenge of efficient and robust locomotion for quadruped robots, particularly for high-speed running over rough terrain with heavy loads, representing an incremental improvement over existing methods by reducing sample efficiency and human bias.

The paper tackles the problem of developing robust high-speed running controllers for quadruped robots by introducing a deep reinforcement learning framework that operates in task space, resulting in policies that achieve speeds over 4 m/s without load and 3.5 m/s with a 10 kg load in simulation, and 2 m/s with a 5 kg load in real-world transfer.

Deep reinforcement learning has emerged as a popular and powerful way to develop locomotion controllers for quadruped robots. Common approaches have largely focused on learning actions directly in joint space, or learning to modify and offset foot positions produced by trajectory generators. Both approaches typically require careful reward shaping and training for millions of time steps, and with trajectory generators introduce human bias into the resulting control policies. In this paper, we present a learning framework that leads to the natural emergence of fast and robust bounding policies for quadruped robots. The agent both selects and controls actions directly in task space to track desired velocity commands subject to environmental noise including model uncertainty and rough terrain. We observe that this framework improves sample efficiency, necessitates little reward shaping, leads to the emergence of natural gaits such as galloping and bounding, and eases the sim-to-real transfer at running speeds. Policies can be learned in only a few million time steps, even for challenging tasks of running over rough terrain with loads of over 100% of the nominal quadruped mass. Training occurs in PyBullet, and we perform a sim-to-sim transfer to Gazebo and sim-to-real transfer to the Unitree A1 hardware. For sim-to-sim, our results show the quadruped is able to run at over 4 m/s without a load, and 3.5 m/s with a 10 kg load, which is over 83% of the nominal quadruped mass. For sim-to-real, the Unitree A1 is able to bound at 2 m/s with a 5 kg load, representing 42% of the nominal quadruped mass.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes