ROAICVHCJan 7, 2025

VLM-driven Behavior Tree for Context-aware Task Planning

arXiv:2501.03968v28 citationsh-index: 14
Originality Incremental advance
AI Analysis

This addresses the challenge of enabling robots to perform tasks adaptively in dynamic visual settings, though it is incremental as it builds on existing LLM-based BT generation methods.

The paper tackled the problem of generating context-aware robot behavior in visually complex environments by using Vision-Language Models to create and edit Behavior Trees with visual condition nodes, and validated the framework in a real-world cafe scenario to show feasibility and limitations.

The use of Large Language Models (LLMs) for generating Behavior Trees (BTs) has recently gained attention in the robotics community, yet remains in its early stages of development. In this paper, we propose a novel framework that leverages Vision-Language Models (VLMs) to interactively generate and edit BTs that address visual conditions, enabling context-aware robot operations in visually complex environments. A key feature of our approach lies in the conditional control through self-prompted visual conditions. Specifically, the VLM generates BTs with visual condition nodes, where conditions are expressed as free-form text. Another VLM process integrates the text into its prompt and evaluates the conditions against real-world images during robot execution. We validated our framework in a real-world cafe scenario, demonstrating both its feasibility and limitations.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes