CVMar 25, 2021

Video Instance Segmentation with a Propose-Reduce Paradigm

arXiv:2103.13746v2110 citationsHas Code
Originality Highly original
AI Analysis

This addresses the problem of accurate and efficient video instance segmentation for computer vision applications, representing a novel paradigm shift rather than an incremental improvement.

The paper tackles video instance segmentation by introducing a Propose-Reduce paradigm to generate complete sequences in a single step, avoiding error accumulation from prior methods, achieving state-of-the-art results with 47.6% AP on YouTube-VIS and 70.4% J&F on DAVIS-UVOS.

Video instance segmentation (VIS) aims to segment and associate all instances of predefined classes for each frame in videos. Prior methods usually obtain segmentation for a frame or clip first, and merge the incomplete results by tracking or matching. These methods may cause error accumulation in the merging step. Contrarily, we propose a new paradigm -- Propose-Reduce, to generate complete sequences for input videos by a single step. We further build a sequence propagation head on the existing image-level instance segmentation network for long-term propagation. To ensure robustness and high recall of our proposed framework, multiple sequences are proposed where redundant sequences of the same instance are reduced. We achieve state-of-the-art performance on two representative benchmark datasets -- we obtain 47.6% in terms of AP on YouTube-VIS validation set and 70.4% for J&F on DAVIS-UVOS validation set. Code is available at https://github.com/dvlab-research/ProposeReduce.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes