ROAICVLGOct 18, 2022

From Play to Policy: Conditional Behavior Generation from Uncurated Robot Data

arXiv:2210.10047v3141 citationsh-index: 41
Originality Highly original
AI Analysis

This work addresses the challenge of leveraging diverse, multi-modal robot play data for policy learning, enabling more efficient and scalable robotics applications.

The paper tackled the problem of extracting task-centric behaviors from noisy, uncurated robot demonstration data by introducing Conditional Behavior Transformers (C-BeT), which improved upon prior state-of-the-art methods by an average of 45.7% on simulated benchmarks and demonstrated real-world robot learning without task labels or rewards.

While large-scale sequence modeling from offline data has led to impressive performance gains in natural language and image generation, directly translating such ideas to robotics has been challenging. One critical reason for this is that uncurated robot demonstration data, i.e. play data, collected from non-expert human demonstrators are often noisy, diverse, and distributionally multi-modal. This makes extracting useful, task-centric behaviors from such data a difficult generative modeling problem. In this work, we present Conditional Behavior Transformers (C-BeT), a method that combines the multi-modal generation ability of Behavior Transformer with future-conditioned goal specification. On a suite of simulated benchmark tasks, we find that C-BeT improves upon prior state-of-the-art work in learning from play data by an average of 45.7%. Further, we demonstrate for the first time that useful task-centric behaviors can be learned on a real-world robot purely from play data without any task labels or reward information. Robot videos are best viewed on our project website: https://play-to-policy.github.io

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes