Semi-Automatic Labeling for Deep Learning in Robotics
This method addresses the time-consuming and labor-intensive process of manual data annotation for computer vision in robotics, making deep learning more automated and reliable, though it is incremental as it builds on existing object detectors like YOLO and SSD.
The paper tackles the problem of creating large labeled datasets for deep learning in robotics by proposing a semi-automatic labeling method using a robot-mounted camera and augmented reality, which reduces annotation time from over 10 hours to less than one hour for 35,000 frames (a 450x gain) and improves object detection precision and recall by about 15%.
In this paper, we propose Augmented Reality Semi-automatic labeling (ARS), a semi-automatic method which leverages on moving a 2D camera by means of a robot, proving precise camera tracking, and an augmented reality pen to define initial object bounding box, to create large labeled datasets with minimal human intervention. By removing the burden of generating annotated data from humans, we make the Deep Learning technique applied to computer vision, that typically requires very large datasets, truly automated and reliable. With the ARS pipeline, we created effortlessly two novel datasets, one on electromechanical components (industrial scenario) and one on fruits (daily-living scenario), and trained robustly two state-of-the-art object detectors, based on convolutional neural networks, such as YOLO and SSD. With respect to the conventional manual annotation of 1000 frames that takes us slightly more than 10 hours, the proposed approach based on ARS allows annotating 9 sequences of about 35000 frames in less than one hour, with a gain factor of about 450. Moreover, both the precision and recall of object detection is increased by about 15\% with respect to manual labeling. All our software is available as a ROS package in a public repository alongside the novel annotated datasets.