CVMar 28, 2019

BubbleNets: Learning to Select the Guidance Frame in Video Object Segmentation by Deep Sorting Frames

arXiv:1903.11779v243 citations
Originality Incremental advance
AI Analysis

This addresses the issue of suboptimal frame selection for annotation in video object segmentation, which is incremental as it builds on existing segmentation methods.

The paper tackles the problem of selecting the optimal frame for annotation in semi-supervised video object segmentation, achieving an 11% relative improvement in segmentation performance on the DAVIS benchmark without modifying the underlying segmentation method.

Semi-supervised video object segmentation has made significant progress on real and challenging videos in recent years. The current paradigm for segmentation methods and benchmark datasets is to segment objects in video provided a single annotation in the first frame. However, we find that segmentation performance across the entire video varies dramatically when selecting an alternative frame for annotation. This paper address the problem of learning to suggest the single best frame across the video for user annotation-this is, in fact, never the first frame of video. We achieve this by introducing BubbleNets, a novel deep sorting network that learns to select frames using a performance-based loss function that enables the conversion of expansive amounts of training examples from already existing datasets. Using BubbleNets, we are able to achieve an 11% relative improvement in segmentation performance on the DAVIS benchmark without any changes to the underlying method of segmentation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes