ROCVJan 31, 2025

Mobile Robot Navigation Using Hand-Drawn Maps: A Vision Language Model Approach

arXiv:2502.00114v213 citationsh-index: 35IEEE Robot Autom Lett
Originality Incremental advance
AI Analysis

This addresses the challenge of enabling robots to interpret human-drawn maps for navigation, which is incremental as it builds on existing vision language models.

The paper tackles the problem of mobile robot navigation using inaccurate hand-drawn maps by introducing the HAM-Nav architecture, which leverages vision language models to achieve high navigation success rates in simulated and real-world environments, outperforming non-hand-drawn map approaches.

Hand-drawn maps can be used to convey navigation instructions between humans and robots in a natural and efficient manner. However, these maps can often contain inaccuracies such as scale distortions and missing landmarks which present challenges for mobile robot navigation. This paper introduces a novel Hand-drawn Map Navigation (HAM-Nav) architecture that leverages pre-trained vision language models (VLMs) for robot navigation across diverse environments, hand-drawing styles, and robot embodiments, even in the presence of map inaccuracies. HAM-Nav integrates a unique Selective Visual Association Prompting approach for topological map-based position estimation and navigation planning as well as a Predictive Navigation Plan Parser to infer missing landmarks. Extensive experiments were conducted in photorealistic simulated environments, using both wheeled and legged robots, demonstrating the effectiveness of HAM-Nav in terms of navigation success rates and Success weighted by Path Length. Furthermore, a user study in real-world environments highlighted the practical utility of hand-drawn maps for robot navigation as well as successful navigation outcomes compared against a non-hand-drawn map approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes