CVMar 10, 2020

Learning Video Object Segmentation from Unlabeled Videos

arXiv:2003.05020v1153 citations
AI Analysis

This addresses the problem of high annotation costs in video object segmentation for computer vision researchers, though it appears incremental as it builds on existing unsupervised/weakly supervised methods.

The paper tackles video object segmentation by learning from unlabeled videos to reduce annotation burden, achieving promising performance in zero-shot and one-shot settings.

We propose a new method for video object segmentation (VOS) that addresses object pattern learning from unlabeled videos, unlike most existing methods which rely heavily on extensive annotated data. We introduce a unified unsupervised/weakly supervised learning framework, called MuG, that comprehensively captures intrinsic properties of VOS at multiple granularities. Our approach can help advance understanding of visual patterns in VOS and significantly reduce annotation burden. With a carefully-designed architecture and strong representation learning ability, our learned model can be applied to diverse VOS settings, including object-level zero-shot VOS, instance-level zero-shot VOS, and one-shot VOS. Experiments demonstrate promising performance in these settings, as well as the potential of MuG in leveraging unlabeled data to further improve the segmentation accuracy.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes