ROLGOct 5, 2022

Visual Backtracking Teleoperation: A Data Collection Protocol for Offline Image-Based Reinforcement Learning

arXiv:2210.02343v114 citationsh-index: 30
Originality Incremental advance
AI Analysis

This work addresses data collection efficiency for offline image-based reinforcement learning in robotics, offering a domain-specific improvement for deformable manipulation tasks.

The paper tackled the problem of efficiently collecting teleoperation data for learning image-based value functions and policies in sparse reward robotic tasks by introducing Visual Backtracking Teleoperation (VBT), a protocol that collects failures, recoveries, and successes, and demonstrated a 13% improvement in offline reinforcement learning over behavior cloning on a real robot T-shirt grasping task with 60 minutes of data.

We consider how to most efficiently leverage teleoperator time to collect data for learning robust image-based value functions and policies for sparse reward robotic tasks. To accomplish this goal, we modify the process of data collection to include more than just successful demonstrations of the desired task. Instead we develop a novel protocol that we call Visual Backtracking Teleoperation (VBT), which deliberately collects a dataset of visually similar failures, recoveries, and successes. VBT data collection is particularly useful for efficiently learning accurate value functions from small datasets of image-based observations. We demonstrate VBT on a real robot to perform continuous control from image observations for the deformable manipulation task of T-shirt grasping. We find that by adjusting the data collection process we improve the quality of both the learned value functions and policies over a variety of baseline methods for data collection. Specifically, we find that offline reinforcement learning on VBT data outperforms standard behavior cloning on successful demonstration data by 13% when both methods are given equal-sized datasets of 60 minutes of data from the real robot.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes