RO LGSep 23, 2021

Hierarchies of Planning and Reinforcement Learning for Robot Navigation

Jan Wöhlke, Felix Schmitt, Herke van Hoof

arXiv:2109.11178v216.436 citations

Originality Incremental advance

AI Analysis

This work addresses robotic navigation challenges for robots by providing a more adaptable hierarchical approach, though it is incremental as it builds on existing hierarchical methods.

The paper tackles robotic navigation tasks with sparse rewards and long horizons by introducing a hierarchical framework that uses a trainable planning policy in a high-level representation, learning robot capabilities and environment conditions from rollout data. In simulated tasks, the method consistently improves over vanilla RL, matches vanilla hierarchical RL on single layouts while being more broadly applicable, and matches or exceeds trainable high-level planning baselines, with marked improvements in a parking task with non-holonomic dynamics.

Solving robotic navigation tasks via reinforcement learning (RL) is challenging due to their sparse reward and long decision horizon nature. However, in many navigation tasks, high-level (HL) task representations, like a rough floor plan, are available. Previous work has demonstrated efficient learning by hierarchal approaches consisting of path planning in the HL representation and using sub-goals derived from the plan to guide the RL policy in the source task. However, these approaches usually neglect the complex dynamics and sub-optimal sub-goal-reaching capabilities of the robot during planning. This work overcomes these limitations by proposing a novel hierarchical framework that utilizes a trainable planning policy for the HL representation. Thereby robot capabilities and environment conditions can be learned utilizing collected rollout data. We specifically introduce a planning policy based on value iteration with a learned transition model (VI-RL). In simulated robotic navigation tasks, VI-RL results in consistent strong improvement over vanilla RL, is on par with vanilla hierarchal RL on single layouts but more broadly applicable to multiple layouts, and is on par with trainable HL path planning baselines except for a parking task with difficult non-holonomic dynamics where it shows marked improvements.

View on arXiv PDF

Similar