RO LGJul 1, 2025

Jump-Start Reinforcement Learning with Self-Evolving Priors for Extreme Monopedal Locomotion

Ziang Zheng, Guojian Zhan, Shiqi Liu, Yao Lyu, Tao Zhang, Shengbo Eben Li

arXiv:2507.01243v15.71 citationsh-index: 6

Originality Incremental advance

AI Analysis

This work addresses a significant problem in robotics for enabling agile locomotion in extreme conditions, though it appears incremental as it builds on existing RL methods with a novel training structure.

The paper tackles the challenge of training reinforcement learning policies for monopedal hopping robots under extreme underactuation and terrains by proposing JumpER, a framework that uses self-evolving priors and a multi-stage curriculum, enabling quadruped robots to achieve robust hopping on unpredictable terrains for the first time, including handling gaps up to 60 cm and varying stepping stones.

Reinforcement learning (RL) has shown great potential in enabling quadruped robots to perform agile locomotion. However, directly training policies to simultaneously handle dual extreme challenges, i.e., extreme underactuation and extreme terrains, as in monopedal hopping tasks, remains highly challenging due to unstable early-stage interactions and unreliable reward feedback. To address this, we propose JumpER (jump-start reinforcement learning via self-evolving priors), an RL training framework that structures policy learning into multiple stages of increasing complexity. By dynamically generating self-evolving priors through iterative bootstrapping of previously learned policies, JumpER progressively refines and enhances guidance, thereby stabilizing exploration and policy optimization without relying on external expert priors or handcrafted reward shaping. Specifically, when integrated with a structured three-stage curriculum that incrementally evolves action modality, observation space, and task objective, JumpER enables quadruped robots to achieve robust monopedal hopping on unpredictable terrains for the first time. Remarkably, the resulting policy effectively handles challenging scenarios that traditional methods struggle to conquer, including wide gaps up to 60 cm, irregularly spaced stairs, and stepping stones with distances varying from 15 cm to 35 cm. JumpER thus provides a principled and scalable approach for addressing locomotion tasks under the dual challenges of extreme underactuation and extreme terrains.

View on arXiv PDF

Similar