ROAIMar 13

Evaluating VLMs' Spatial Reasoning Over Robot Motion: A Step Towards Robot Planning with Motion Preferences

arXiv:2603.1310049.2h-index: 1
Predicted impact top 40% in RO · last 90 daysOriginality Synthesis-oriented
AI Analysis

This work addresses the problem of integrating VLMs into robot planning for motion preferences, but it is incremental as it focuses on evaluation rather than a new method.

The paper evaluated the spatial reasoning capabilities of four state-of-the-art Vision-Language Models (VLMs) for robot motion preferences, finding that Qwen2.5-VL achieved 71.4% accuracy zero-shot and 75% after fine-tuning, while GPT-4o performed worse.

Understanding user instructions and object spatial relations in surrounding environments is crucial for intelligent robot systems to assist humans in various tasks. The natural language and spatial reasoning capabilities of Vision-Language Models (VLMs) have the potential to enhance the generalization of robot planners on new tasks, objects, and motion specifications. While foundation models have been applied to task planning, it is still unclear the degree to which they have the capability of spatial reasoning required to enforce user preferences or constraints on motion, such as desired distances from objects, topological properties, or motion style preferences. In this paper, we evaluate the capability of four state-of-the-art VLMs at spatial reasoning over robot motion, using four different querying methods. Our results show that, with the highest-performing querying method, Qwen2.5-VL achieves 71.4% accuracy zero-shot and 75% on a smaller model after fine-tuning, and GPT-4o leads to lower performance. We evaluate two types of motion preferences (object-proximity and path-style), and we also analyze the trade-off between accuracy and computation cost in number of tokens. This work shows some promise in the potential of VLM integration with robot motion planning pipelines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes