TADPO: Reinforcement Learning Goes Off-road

Zhouchonghao Wu, Raymond Song, Vedant Mundheda, Luis E. Navarro-Serment, Christof Schoenborn, Jeff Schneider

arXiv:2603.05995v111.8h-index: 2

Predicted impact top 27% in RO · last 90 daysOriginality Highly original

AI Analysis

This work addresses the significant challenge of off-road autonomous driving for full-scale vehicles, an area where RL-based policies have not been previously deployed.

This paper tackles the problem of high-speed off-road autonomous driving using reinforcement learning, a challenging long-horizon task with low-signal rewards. The authors developed TADPO, a novel policy gradient formulation that extends PPO, and deployed it zero-shot from simulation to a full-scale off-road vehicle, demonstrating its ability to navigate extreme slopes and obstacle-rich terrain.

Off-road autonomous driving poses significant challenges such as navigating unmapped, variable terrain with uncertain and diverse dynamics. Addressing these challenges requires effective long-horizon planning and adaptable control. Reinforcement Learning (RL) offers a promising solution by learning control policies directly from interaction. However, because off-road driving is a long-horizon task with low-signal rewards, standard RL methods are challenging to apply in this setting. We introduce TADPO, a novel policy gradient formulation that extends Proximal Policy Optimization (PPO), leveraging off-policy trajectories for teacher guidance and on-policy trajectories for student exploration. Building on this, we develop a vision-based, end-to-end RL system for high-speed off-road driving, capable of navigating extreme slopes and obstacle-rich terrain. We demonstrate our performance in simulation and, importantly, zero-shot sim-to-real transfer on a full-scale off-road vehicle. To our knowledge, this work represents the first deployment of RL-based policies on a full-scale off-road platform.

View on arXiv PDF

Similar