CV RONov 10, 2025

PanoNav: Mapless Zero-Shot Object Navigation with Panoramic Scene Parsing and Dynamic Memory

arXiv:2511.06840v110.23 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of enabling robots to navigate to objects without prior maps or depth sensors, though it appears incremental as it builds on existing mapless approaches by improving decision-making to avoid deadlocks.

The paper tackles the problem of zero-shot object navigation in unseen environments for household robots by proposing PanoNav, a fully RGB-only, mapless framework that integrates panoramic scene parsing and dynamic memory, resulting in significant outperformance over baselines in SR and SPL metrics on a public benchmark.

Zero-shot object navigation (ZSON) in unseen environments remains a challenging problem for household robots, requiring strong perceptual understanding and decision-making capabilities. While recent methods leverage metric maps and Large Language Models (LLMs), they often depend on depth sensors or prebuilt maps, limiting the spatial reasoning ability of Multimodal Large Language Models (MLLMs). Mapless ZSON approaches have emerged to address this, but they typically make short-sighted decisions, leading to local deadlocks due to a lack of historical context. We propose PanoNav, a fully RGB-only, mapless ZSON framework that integrates a Panoramic Scene Parsing module to unlock the spatial parsing potential of MLLMs from panoramic RGB inputs, and a Memory-guided Decision-Making mechanism enhanced by a Dynamic Bounded Memory Queue to incorporate exploration history and avoid local deadlocks. Experiments on the public navigation benchmark show that PanoNav significantly outperforms representative baselines in both SR and SPL metrics.

View on arXiv PDF

Similar