AI ROMay 7, 2023

Train a Real-world Local Path Planner in One Hour via Partially Decoupled Reinforcement Learning and Vectorized Diversity

Jinghao Xin, Jinwoo Kim, Zhi Li, Ning Li

arXiv:2305.04180v35.44 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the problem of deploying DRL-based path planners in real-world robotics by improving training efficiency and generalization, though it appears incremental as it builds on existing DRL methods with optimizations.

The paper tackles the inefficient training and poor generalization of Deep Reinforcement Learning (DRL) for real-world Local Path Planning (LPP) by proposing Color, which includes an Actor-Sharer-Learner framework and a lightweight simulator, achieving training in one hour with enhanced performance across 57 benchmark environments and 68 simulated/real-world scenarios.

Deep Reinforcement Learning (DRL) has exhibited efficacy in resolving the Local Path Planning (LPP) problem. However, such application in the real world is immensely limited due to the deficient training efficiency and generalization capability of DRL. To alleviate these two issues, a solution named Color is proposed, which consists of an Actor-Sharer-Learner (ASL) training framework and a mobile robot-oriented simulator Sparrow. Specifically, the ASL intends to improve the training efficiency of DRL algorithms. It employs a Vectorized Data Collection (VDC) mode to expedite data acquisition, decouples the data collection from model optimization by multithreading, and partially connects the two procedures by harnessing a Time Feedback Mechanism (TFM) to evade data underuse or overuse. Meanwhile, the Sparrow simulator utilizes a 2D grid-based world, simplified kinematics, and conversion-free data flow to achieve a lightweight design. The lightness facilitates vectorized diversity, allowing diversified simulation setups across extensive copies of the vectorized environments, resulting in a notable enhancement in the generalization capability of the DRL algorithm being trained. Comprehensive experiments, comprising 57 DRL benchmark environments, 32 simulated and 36 real-world LPP scenarios, have been conducted to corroborate the superiority of our method in terms of efficiency and generalization. The code and the video of this paper are accessible at https://github.com/XinJingHao/Color.

View on arXiv PDF Code

Similar