Improving Semantic Segmentation via Self-Training
This work addresses the problem of reducing annotation costs for semantic segmentation in computer vision, offering an incremental improvement over existing methods.
The paper tackles the need for large amounts of pixelwise annotations in semantic segmentation by using a self-training semi-supervised approach, achieving state-of-the-art results on Cityscapes, CamVid, and KITTI datasets with significantly less supervision and up to 2x faster training without performance loss.
Deep learning usually achieves the best results with complete supervision. In the case of semantic segmentation, this means that large amounts of pixelwise annotations are required to learn accurate models. In this paper, we show that we can obtain state-of-the-art results using a semi-supervised approach, specifically a self-training paradigm. We first train a teacher model on labeled data, and then generate pseudo labels on a large set of unlabeled data. Our robust training framework can digest human-annotated and pseudo labels jointly and achieve top performances on Cityscapes, CamVid and KITTI datasets while requiring significantly less supervision. We also demonstrate the effectiveness of self-training on a challenging cross-domain generalization task, outperforming conventional finetuning method by a large margin. Lastly, to alleviate the computational burden caused by the large amount of pseudo labels, we propose a fast training schedule to accelerate the training of segmentation models by up to 2x without performance degradation.