CVNov 18, 2017

Transferable Semi-supervised Semantic Segmentation

Huaxin Xiao, Yunchao Wei, Yu Liu, Maojun Zhang, Jiashi Feng

arXiv:1711.06828v28.012 citations

Originality Highly original

AI Analysis

This addresses the scalability and applicability limitations of semantic segmentation models in real applications by reducing annotation requirements.

The paper tackles the problem of data scarcity in semantic segmentation by proposing a transferable semi-supervised model that transfers segmentation knowledge from categories with pixel-level annotations to unseen categories with only image-level annotations, achieving 96.5% and 89.4% of fully-supervised baseline performance with 50% and 0% pixel-level annotated categories on PASCAL VOC 2012.

The performance of deep learning based semantic segmentation models heavily depends on sufficient data with careful annotations. However, even the largest public datasets only provide samples with pixel-level annotations for rather limited semantic categories. Such data scarcity critically limits scalability and applicability of semantic segmentation models in real applications. In this paper, we propose a novel transferable semi-supervised semantic segmentation model that can transfer the learned segmentation knowledge from a few strong categories with pixel-level annotations to unseen weak categories with only image-level annotations, significantly broadening the applicable territory of deep segmentation models. In particular, the proposed model consists of two complementary and learnable components: a Label transfer Network (L-Net) and a Prediction transfer Network (P-Net). The L-Net learns to transfer the segmentation knowledge from strong categories to the images in the weak categories and produces coarse pixel-level semantic maps, by effectively exploiting the similar appearance shared across categories. Meanwhile, the P-Net tailors the transferred knowledge through a carefully designed adversarial learning strategy and produces refined segmentation results with better details. Integrating the L-Net and P-Net achieves 96.5% and 89.4% performance of the fully-supervised baseline using 50% and 0% categories with pixel-level annotations respectively on PASCAL VOC 2012. With such a novel transfer mechanism, our proposed model is easily generalizable to a variety of new categories, only requiring image-level annotations, and offers appealing scalability in real applications.

View on arXiv PDF

Similar