Active Self-Training for Weakly Supervised 3D Scene Semantic Segmentation
This work addresses the time-consuming labeling process for 3D point clouds, offering a weakly supervised method that is incremental over existing approaches.
The paper tackles the problem of reducing annotation effort for 3D scene semantic segmentation by combining active learning with self-training, resulting in improved segmentation performance with only a small number of user annotations.
Since the preparation of labeled data for training semantic segmentation networks of point clouds is a time-consuming process, weakly supervised approaches have been introduced to learn from only a small fraction of data. These methods are typically based on learning with contrastive losses while automatically deriving per-point pseudo-labels from a sparse set of user-annotated labels. In this paper, our key observation is that the selection of what samples to annotate is as important as how these samples are used for training. Thus, we introduce a method for weakly supervised segmentation of 3D scenes that combines self-training with active learning. The active learning selects points for annotation that likely result in performance improvements to the trained model, while the self-training makes efficient use of the user-provided labels for learning the model. We demonstrate that our approach leads to an effective method that provides improvements in scene segmentation over previous works and baselines, while requiring only a small number of user annotations.