AIROJul 26, 2024

Wonderful Team: Zero-Shot Physical Task Planning with Visual LLMs

arXiv:2407.19094v612 citationsh-index: 13
Originality Highly original
AI Analysis

This addresses robotic manipulation planning problems for researchers and practitioners, offering a novel integrated approach with strong performance gains.

The paper tackles robotic task planning by introducing Wonderful Team, a multi-agent Vision Large Language Model framework that performs zero-shot high-level planning from images and task descriptions, achieving significant improvements such as a 40% average success rate increase on VimaBench over prior methods.

We introduce Wonderful Team, a multi-agent Vision Large Language Model (VLLM) framework for executing high-level robotic planning in a zero-shot regime. In our context, zero-shot high-level planning means that for a novel environment, we provide a VLLM with an image of the robot's surroundings and a task description, and the VLLM outputs the sequence of actions necessary for the robot to complete the task. Unlike previous methods for high-level visual planning for robotic manipulation, our method uses VLLMs for the entire planning process, enabling a more tightly integrated loop between perception, control, and planning. As a result, Wonderful Team's performance on real-world semantic and physical planning tasks often exceeds methods that rely on separate vision systems. For example, we see an average 40% success rate improvement on VimaBench over prior methods such as NLaP, an average 30% improvement over Trajectory Generators on tasks from the Trajectory Generator paper, including drawing and wiping a plate, and an average 70% improvement over Trajectory Generators on a new set of semantic reasoning tasks including environment rearrangement with implicit linguistic constraints. We hope these results highlight the rapid improvements of VLLMs in the past year, and motivate the community to consider VLLMs as an option for some high-level robotic planning problems in the future.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes