Self-supervised Transfer Learning for Instance Segmentation through Physical Interaction
This addresses the time-consuming manual labeling for new environments in robotics, though it is incremental as it builds on existing methods like DeepMask.
The paper tackles the problem of instance segmentation for unknown objects in robotics by using a self-supervised transfer learning approach where a robot pushes objects and uses optical flow to generate training labels, resulting in a 9.5% improvement in average precision over a baseline trained on COCO.
Instance segmentation of unknown objects from images is regarded as relevant for several robot skills including grasping, tracking and object sorting. Recent results in computer vision have shown that large hand-labeled datasets enable high segmentation performance. To overcome the time-consuming process of manually labeling data for new environments, we present a transfer learning approach for robots that learn to segment objects by interacting with their environment in a self-supervised manner. Our robot pushes unknown objects on a table and uses information from optical flow to create training labels in the form of object masks. To achieve this, we fine-tune an existing DeepMask network for instance segmentation on the self-labeled training data acquired by the robot. We evaluate our trained network (SelfDeepMask) on a set of real images showing challenging and cluttered scenes with novel objects. Here, SelfDeepMask outperforms the DeepMask network trained on the COCO dataset by 9.5% in average precision. Furthermore, we combine our approach with recent approaches for training with noisy labels in order to better cope with induced label noise.