ROMay 5

From Language to Logic: A Theoretical Architecture for VLM-Grounded Safe Navigation

Kristy Sakano, Kalonji Harrington, Mumu Xu

arXiv:2605.0432750.9

Predicted impact top 44% in RO · last 90 daysOriginality Synthesis-oriented

AI Analysis

This work addresses the challenge of integrating human-provided safety rules into autonomous navigation, but it is purely theoretical with no empirical validation.

The paper proposes an architecture that translates natural-language safety rules into Signal Temporal Logic specifications for autonomous navigation in unstructured outdoor environments, using Vision-Language Models for zero-shot scene understanding. No experimental results are provided.

We propose an architecture for integrating high-level, human-provided safety rules and operator-aligned semantic preferences into autonomous robot navigation in unstructured outdoor environments. In our approach, natural-language rules are translated into Signal Temporal Logic (STL) specifications that guide planning and navigation during runtime. Persistent, environment-centric rules and terrain preferences are grounded into a 2D cost map, while temporally dynamic requirements are expressed as STL specifications to be monitored during runtime. We hypothesize the use of Vision-Language Models (VLMs) for zero-shot scene understanding, enabling mapping between human instructions, semantic features, and environmental constraints. Within this framework, we construct an illustrative navigation model that is designed to satisfy a set of STL-encoded specifications and soft operator preferences through formal satisfaction metrics embedded into environmental properties and runtime monitoring.

View on arXiv PDF

Similar