The CoSTAR Block Stacking Dataset: Learning with Workspace Constraints
This addresses the challenge of robot manipulation in less constrained environments for robotics researchers, though it is incremental as it builds on existing datasets and methods.
The authors tackled the problem of neural network grasping algorithms failing under realistic workspace constraints by introducing the JHU CoSTAR Block Stacking Dataset, which contains nearly 12,000 stacking attempts and over 2 million frames, and they established a baseline using an automated search with a novel HyperTree MetaModel to achieve reasonable 3D pose predictions.
A robot can now grasp an object more effectively than ever before, but once it has the object what happens next? We show that a mild relaxation of the task and workspace constraints implicit in existing object grasping datasets can cause neural network based grasping algorithms to fail on even a simple block stacking task when executed under more realistic circumstances. To address this, we introduce the JHU CoSTAR Block Stacking Dataset (BSD), where a robot interacts with 5.1 cm colored blocks to complete an order-fulfillment style block stacking task. It contains dynamic scenes and real time-series data in a less constrained environment than comparable datasets. There are nearly 12,000 stacking attempts and over 2 million frames of real data. We discuss the ways in which this dataset provides a valuable resource for a broad range of other topics of investigation. We find that hand-designed neural networks that work on prior datasets do not generalize to this task. Thus, to establish a baseline for this dataset, we demonstrate an automated search of neural network based models using a novel multiple-input HyperTree MetaModel, and find a final model which makes reasonable 3D pose predictions for grasping and stacking on our dataset. The CoSTAR BSD, code, and instructions are available at https://sites.google.com/site/costardataset.