ROCVJan 8, 2025

Robotic Programmer: Video Instructed Policy Code Generation for Robotic Manipulation

arXiv:2501.04268v15 citationsh-index: 36IROS
Originality Incremental advance
AI Analysis

This addresses the problem of enabling robots to perform diverse manipulation tasks without task-specific training, though it builds incrementally on existing policy code generation methods.

The authors tackled zero-shot generalization in robotic manipulation by proposing Robotic Programmer (RoboPro), a foundation model that generates policy code from visual inputs and instructions, achieving a state-of-the-art zero-shot success rate on RLBench that surpasses GPT-4o by 11.6%.

Zero-shot generalization across various robots, tasks and environments remains a significant challenge in robotic manipulation. Policy code generation methods use executable code to connect high-level task descriptions and low-level action sequences, leveraging the generalization capabilities of large language models and atomic skill libraries. In this work, we propose Robotic Programmer (RoboPro), a robotic foundation model, enabling the capability of perceiving visual information and following free-form instructions to perform robotic manipulation with policy code in a zero-shot manner. To address low efficiency and high cost in collecting runtime code data for robotic tasks, we devise Video2Code to synthesize executable code from extensive videos in-the-wild with off-the-shelf vision-language model and code-domain large language model. Extensive experiments show that RoboPro achieves the state-of-the-art zero-shot performance on robotic manipulation in both simulators and real-world environments. Specifically, the zero-shot success rate of RoboPro on RLBench surpasses the state-of-the-art model GPT-4o by 11.6%, which is even comparable to a strong supervised training baseline. Furthermore, RoboPro is robust to variations on API formats and skill sets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes