ROAIMar 19, 2024

Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers

arXiv:2403.12943v258 citationsRobotics: Science and Systems
Originality Highly original
AI Analysis

This addresses the challenge of specifying tasks for multi-task robotic manipulation systems without relying on text, potentially reducing the need for explicit programming.

The paper tackles the problem of enabling robots to learn manipulation tasks by observing human videos, introducing Vid2Robot as an end-to-end video-conditioned policy that achieves over 20% improvement over BC-Z when using human prompt videos.

Large-scale multi-task robotic manipulation systems often rely on text to specify the task. In this work, we explore whether a robot can learn by observing humans. To do so, the robot must understand a person's intent and perform the inferred task despite differences in the embodiments and environments. We introduce Vid2Robot, an end-to-end video-conditioned policy that takes human videos demonstrating manipulation tasks as input and produces robot actions. Our model is trained with a large dataset of prompt video-robot trajectory pairs to learn unified representations of human and robot actions from videos. Vid2Robot uses cross-attention transformer layers between video features and the current robot state to produce the actions and perform the same task as shown in the video. We use auxiliary contrastive losses to align the prompt and robot video representations for better policies. We evaluate Vid2Robot on real-world robots and observe over 20% improvement over BC-Z when using human prompt videos. Further, we also show cross-object motion transfer ability that enables video-conditioned policies to transfer a motion observed on one object in the prompt video to another object in the robot's own environment. Videos available at https://vid2robot.github.io

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes