CVDec 4, 2018

Multiview Cross-supervision for Semantic Segmentation

arXiv:1812.01738v13 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of data scarcity in semantic segmentation for customized tasks, such as non-human species or social video subjects, by introducing a novel cross-supervision method, though it appears incremental as it builds on existing multiview geometry concepts.

The paper tackles the challenge of limited labeled data in customized semantic segmentation by proposing a semi-supervised framework that leverages multiview image streams for cross-supervision, achieving pixel-level recognition in real-world scenarios where large-scale annotation is infeasible.

This paper presents a semi-supervised learning framework for a customized semantic segmentation task using multiview image streams. A key challenge of the customized task lies in the limited accessibility of the labeled data due to the requirement of prohibitive manual annotation effort. We hypothesize that it is possible to leverage multiview image streams that are linked through the underlying 3D geometry, which can provide an additional supervisionary signal to train a segmentation model. We formulate a new cross-supervision method using a shape belief transfer---the segmentation belief in one image is used to predict that of the other image through epipolar geometry analogous to shape-from-silhouette. The shape belief transfer provides the upper and lower bounds of the segmentation for the unlabeled data where its gap approaches asymptotically to zero as the number of the labeled views increases. We integrate this theory to design a novel network that is agnostic to camera calibration, network model, and semantic category and bypasses the intermediate process of suboptimal 3D reconstruction. We validate this network by recognizing a customized semantic category per pixel from realworld visual data including non-human species and a subject of interest in social videos where attaining large-scale annotation data is infeasible.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes