CVDec 27, 2018

S4-Net: Geometry-Consistent Semi-Supervised Semantic Segmentation

arXiv:1812.10717v24 citations
Originality Incremental advance
AI Analysis

This addresses the problem of reducing annotation costs for semantic segmentation in rigid scenes, though it appears incremental as it applies existing semi-supervised techniques to a specific context.

The paper tackles semantic segmentation with limited manual annotations by enforcing geometric 3D consistency between multiple views, showing that one manually labeled image per scene can achieve high performance on the LabelFusion dataset.

We show that it is possible to learn semantic segmentation from very limited amounts of manual annotations, by enforcing geometric 3D constraints between multiple views. More exactly, image locations corresponding to the same physical 3D point should all have the same label. We show that introducing such constraints during learning is very effective, even when no manual label is available for a 3D point, and can be done simply by employing techniques from 'general' semi-supervised learning to the context of semantic segmentation. To demonstrate this idea, we use RGB-D image sequences of rigid scenes, for a 4-class segmentation problem derived from the ScanNet dataset. Starting from RGB-D sequences with a few annotated frames, we show that we can incorporate RGB-D sequences without any manual annotations to improve the performance, which makes our approach very convenient. Furthermore, we demonstrate our approach for semantic segmentation of objects on the LabelFusion dataset, where we show that one manually labeled image in a scene is sufficient for high performance on the whole scene.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes