ROAINov 10, 2025

Using Vision Language Models as Closed-Loop Symbolic Planners for Robotic Applications: A Control-Theoretic Perspective

arXiv:2511.07410v1h-index: 23
Originality Incremental advance
AI Analysis

This work addresses the challenge of reliable high-level planning for robotic applications, but it is incremental as it builds on existing VLM methods without introducing a new paradigm.

The paper tackled the problem of using Vision Language Models (VLMs) for closed-loop symbolic planning in robotics, which is challenging due to unpredictable errors, and investigated how control horizon and warm-starting impact performance through controlled experiments.

Large Language Models (LLMs) and Vision Language Models (VLMs) have been widely used for embodied symbolic planning. Yet, how to effectively use these models for closed-loop symbolic planning remains largely unexplored. Because they operate as black boxes, LLMs and VLMs can produce unpredictable or costly errors, making their use in high-level robotic planning especially challenging. In this work, we investigate how to use VLMs as closed-loop symbolic planners for robotic applications from a control-theoretic perspective. Concretely, we study how the control horizon and warm-starting impact the performance of VLM symbolic planners. We design and conduct controlled experiments to gain insights that are broadly applicable to utilizing VLMs as closed-loop symbolic planners, and we discuss recommendations that can help improve the performance of VLM symbolic planners.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes