CV AIMar 21, 2022

Semantic Segmentation with Active Semi-Supervised Learning

Aneesh Rangnekar, Christopher Kanan, Matthew Hoffman

arXiv:2203.10730v314.540 citationsh-index: 38

Originality Incremental advance

AI Analysis

This work addresses the high cost of pixel-wise annotations in semantic segmentation, offering a more efficient solution for dataset creation, though it is incremental as it builds on existing active and semi-supervised learning methods.

The paper tackles the problem of reducing human annotation effort for semantic segmentation by proposing an algorithm that combines active learning with semi-supervised learning, achieving over 95% of full-training performance using less than 17% of training data on CamVid and CityScapes datasets.

Using deep learning, we now have the ability to create exceptionally good semantic segmentation systems; however, collecting the prerequisite pixel-wise annotations for training images remains expensive and time-consuming. Therefore, it would be ideal to minimize the number of human annotations needed when creating a new dataset. Here, we address this problem by proposing a novel algorithm that combines active learning and semi-supervised learning. Active learning is an approach for identifying the best unlabeled samples to annotate. While there has been work on active learning for segmentation, most methods require annotating all pixel objects in each image, rather than only the most informative regions. We argue that this is inefficient. Instead, our active learning approach aims to minimize the number of annotations per-image. Our method is enriched with semi-supervised learning, where we use pseudo labels generated with a teacher-student framework to identify image regions that help disambiguate confused classes. We also integrate mechanisms that enable better performance on imbalanced label distributions, which have not been studied previously for active learning in semantic segmentation. In experiments on the CamVid and CityScapes datasets, our method obtains over 95% of the network's performance on the full-training set using less than 17% of the training data, whereas the previous state of the art required 40% of the training data.

View on arXiv PDF

Similar