CVNov 2, 2020

Reducing the Annotation Effort for Video Object Segmentation Datasets

arXiv:2011.01142v126 citations
AI Analysis

This work reduces annotation effort for video object segmentation, benefiting researchers and practitioners by enabling larger datasets with less manual labor, though it is incremental as it builds on existing pseudo-labeling and VOS methods.

The paper tackles the problem of high annotation cost for video object segmentation datasets by using a deep convolutional network to generate pixel-level pseudo-labels from cheaper bounding box annotations, showing that adding only one manually annotated mask per object allows training to reach nearly the same performance as with fully segmented videos. It introduces the TAO-VOS benchmark, which remains challenging for state-of-the-art methods, revealing their shortcomings.

For further progress in video object segmentation (VOS), larger, more diverse, and more challenging datasets will be necessary. However, densely labeling every frame with pixel masks does not scale to large datasets. We use a deep convolutional network to automatically create pseudo-labels on a pixel level from much cheaper bounding box annotations and investigate how far such pseudo-labels can carry us for training state-of-the-art VOS approaches. A very encouraging result of our study is that adding a manually annotated mask in only a single video frame for each object is sufficient to generate pseudo-labels which can be used to train a VOS method to reach almost the same performance level as when training with fully segmented videos. We use this workflow to create pixel pseudo-labels for the training set of the challenging tracking dataset TAO, and we manually annotate a subset of the validation set. Together, we obtain the new TAO-VOS benchmark, which we make publicly available at www.vision.rwth-aachen.de/page/taovos. While the performance of state-of-the-art methods on existing datasets starts to saturate, TAO-VOS remains very challenging for current algorithms and reveals their shortcomings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes