Reinforcement Learning with Evolutionary Trajectory Generator: A General Approach for Quadrupedal Locomotion
This addresses the problem of complex dynamics and reward sparsity in quadrupedal robot locomotion for robotics researchers, offering a novel method to improve learning efficiency and task performance, though it is incremental as it builds on prior RL and trajectory generation techniques.
The paper tackles the difficulty of reinforcement learning in learning effective gaits for quadrupedal locomotion from scratch, especially in challenging tasks like walking over a balance beam, by proposing an RL-based approach with an evolutionary foot trajectory generator that optimizes trajectory shapes and guides policy learning, resulting in successful simulation tasks and deployment on a 12-DoF robot with efficient gaits.
Recently reinforcement learning (RL) has emerged as a promising approach for quadrupedal locomotion, which can save the manual effort in conventional approaches such as designing skill-specific controllers. However, due to the complex nonlinear dynamics in quadrupedal robots and reward sparsity, it is still difficult for RL to learn effective gaits from scratch, especially in challenging tasks such as walking over the balance beam. To alleviate such difficulty, we propose a novel RL-based approach that contains an evolutionary foot trajectory generator. Unlike prior methods that use a fixed trajectory generator, the generator continually optimizes the shape of the output trajectory for the given task, providing diversified motion priors to guide the policy learning. The policy is trained with reinforcement learning to output residual control signals that fit different gaits. We then optimize the trajectory generator and policy network alternatively to stabilize the training and share the exploratory data to improve sample efficiency. As a result, our approach can solve a range of challenging tasks in simulation by learning from scratch, including walking on a balance beam and crawling through the cave. To further verify the effectiveness of our approach, we deploy the controller learned in the simulation on a 12-DoF quadrupedal robot, and it can successfully traverse challenging scenarios with efficient gaits.