CVApr 11, 2020

Improving Semantic Segmentation through Spatio-Temporal Consistency Learned from Videos

arXiv:2004.05324v24 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of improving semantic segmentation for computer vision applications, but it is incremental as it builds on existing unsupervised learning techniques.

The paper tackles the problem of improving single-image semantic segmentation by using unsupervised learning of depth, egomotion, and camera intrinsics from videos to enforce 3D-geometric and temporal consistency, resulting in enhanced segmentation quality or reduced label requirements, with experiments conducted on the ScanNet dataset.

We leverage unsupervised learning of depth, egomotion, and camera intrinsics to improve the performance of single-image semantic segmentation, by enforcing 3D-geometric and temporal consistency of segmentation masks across video frames. The predicted depth, egomotion, and camera intrinsics are used to provide an additional supervision signal to the segmentation model, significantly enhancing its quality, or, alternatively, reducing the number of labels the segmentation model needs. Our experiments were performed on the ScanNet dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes