CVMar 9

From Reactive to Map-Based AI: Tuned Local LLMs for Semantic Zone Inference in Object-Goal Navigation

arXiv:2603.08086v116.0
Predicted impact top 91% in CV · last 90 daysOriginality Highly original
AI Analysis

This work provides a more efficient and systematic exploration strategy for object-goal navigation agents by integrating semantic mapping, which is important for robotics and embodied AI researchers.

This paper addresses the limitations of reactive LLM-based agents in Object-Goal Navigation by proposing a Map-Based AI framework. It integrates a LoRA-tuned Llama-2 model to infer semantic zone categories and target existence probabilities from object observations, which are then used to guide exploration via TSP optimization. The approach significantly outperforms traditional frontier exploration and reactive LLM baselines in the AI2-THOR simulator, achieving superior Success Rate and SPL.

Object-Goal Navigation (ObjectNav) requires an agent to find and navigate to a target object category in unknown environments. While recent Large Language Model (LLM)-based agents exhibit zero-shot reasoning, they often rely on a "reactive" paradigm that lacks explicit spatial memory, leading to redundant exploration and myopic behaviors. To address these limitations, we propose a transition from reactive AI to "Map-Based AI" by integrating LLM-based semantic inference with a hybrid topological-grid mapping system. Our framework employs a fine-tuned Llama-2 model via Low-Rank Adaptation (LoRA) to infer semantic zone categories and target existence probabilities from verbalized object observations. In this study, a "zone" is defined as a functional area described by the set of observed objects, providing crucial semantic co-occurrence cues for finding the target. This semantic information is integrated into a topological graph, enabling the agent to prioritize high-probability areas and perform systematic exploration via Traveling Salesman Problem (TSP) optimization. Evaluations in the AI2-THOR simulator demonstrate that our approach significantly outperforms traditional frontier exploration and reactive LLM baselines, achieving a superior Success Rate (SR) and Success weighted by Path Length (SPL).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes