CVDec 14, 2020

Scaling Semantic Segmentation Beyond 1K Classes on a Single GPU

arXiv:2012.07489v20.0012 citations
AI Analysis50

This work provides a method for researchers and practitioners to scale semantic segmentation models to a significantly larger number of classes without increased memory demands, which is an incremental improvement to existing methods.

This paper addresses the limitation of semantic segmentation models to handle a large number of classes due to memory overhead. They propose an embedding-based training methodology that reduces the output space complexity from O(C) to O(1), enabling training on over 1K classes with a single GPU. Their method achieves comparable or better mIoU on standard datasets and a 3x better mIoU on a 1284-class dataset compared to DeeplabV3+.

The state-of-the-art object detection and image classification methods can perform impressively on more than 9k and 10k classes, respectively. In contrast, the number of classes in semantic segmentation datasets is relatively limited. This is not surprising when the restrictions caused by the lack of labeled data and high computation demand for segmentation are considered. In this paper, we propose a novel training methodology to train and scale the existing semantic segmentation models for a large number of semantic classes without increasing the memory overhead. In our embedding-based scalable segmentation approach, we reduce the space complexity of the segmentation model's output from O(C) to O(1), propose an approximation method for ground-truth class probability, and use it to compute cross-entropy loss. The proposed approach is general and can be adopted by any state-of-the-art segmentation model to gracefully scale it for any number of semantic classes with only one GPU. Our approach achieves similar, and in some cases, even better mIoU for Cityscapes, Pascal VOC, ADE20k, COCO-Stuff10k datasets when adopted to DeeplabV3+ model with different backbones. We demonstrate a clear benefit of our approach on a dataset with 1284 classes, bootstrapped from LVIS and COCO annotations, with three times better mIoU than the DeeplabV3+ model.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes