CVDec 10, 2019

Classifying, Segmenting, and Tracking Object Instances in Video with Mask Propagation

arXiv:1912.04573v4194 citations
AI Analysis

This work addresses video instance segmentation for computer vision applications, offering a simpler and more data-efficient solution compared to prior methods.

The authors tackled video instance segmentation by introducing MaskProp, which adapts Mask R-CNN with a mask propagation branch to classify, segment, and track objects across frames, achieving state-of-the-art accuracy on the YouTube-VIS dataset.

We introduce a method for simultaneously classifying, segmenting and tracking object instances in a video sequence. Our method, named MaskProp, adapts the popular Mask R-CNN to video by adding a mask propagation branch that propagates frame-level object instance masks from each video frame to all the other frames in a video clip. This allows our system to predict clip-level instance tracks with respect to the object instances segmented in the middle frame of the clip. Clip-level instance tracks generated densely for each frame in the sequence are finally aggregated to produce video-level object instance segmentation and classification. Our experiments demonstrate that our clip-level instance segmentation makes our approach robust to motion blur and object occlusions in video. MaskProp achieves the best reported accuracy on the YouTube-VIS dataset, outperforming the ICCV 2019 video instance segmentation challenge winner despite being much simpler and using orders of magnitude less labeled data (1.3M vs 1B images and 860K vs 14M bounding boxes).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes