SALT: A Semi-automatic Labeling Tool for RGB-D Video Sequences
This tool addresses the need for efficient data labeling in computer vision, particularly for 3D object pose and segmentation tasks, but it is incremental as it builds on existing annotation methods.
The paper tackles the problem of labeling large RGB-D video datasets by introducing SALT, a semi-automatic tool that reduces annotation time by up to 33.95 times for bounding boxes and 8.55 times for RGB segmentation while maintaining ground truth quality.
Large labeled data sets are one of the essential basics of modern deep learning techniques. Therefore, there is an increasing need for tools that allow to label large amounts of data as intuitively as possible. In this paper, we introduce SALT, a tool to semi-automatically annotate RGB-D video sequences to generate 3D bounding boxes for full six Degrees of Freedom (DoF) object poses, as well as pixel-level instance segmentation masks for both RGB and depth. Besides bounding box propagation through various interpolation techniques, as well as algorithmically guided instance segmentation, our pipeline also provides built-in pre-processing functionalities to facilitate the data set creation process. By making full use of SALT, annotation time can be reduced by a factor of up to 33.95 for bounding box creation and 8.55 for RGB segmentation without compromising the quality of the automatically generated ground truth.