CVMar 29, 2020

Learning a Weakly-Supervised Video Actor-Action Segmentation Model with a Wise Selection

arXiv:2003.13141v115 citations
AI Analysis

This work addresses video analysis for applications like surveillance or robotics, offering an incremental improvement in weakly-supervised segmentation methods.

The paper tackles weakly-supervised video actor-action segmentation by proposing WS^2, a framework that selects high-quality pseudo-annotations and uses a region integrity criterion for training, achieving state-of-the-art performance on weakly-supervised tasks and matching fully-supervised methods on VAAS.

We address weakly-supervised video actor-action segmentation (VAAS), which extends general video object segmentation (VOS) to additionally consider action labels of the actors. The most successful methods on VOS synthesize a pool of pseudo-annotations (PAs) and then refine them iteratively. However, they face challenges as to how to select from a massive amount of PAs high-quality ones, how to set an appropriate stop condition for weakly-supervised training, and how to initialize PAs pertaining to VAAS. To overcome these challenges, we propose a general Weakly-Supervised framework with a Wise Selection of training samples and model evaluation criterion (WS^2). Instead of blindly trusting quality-inconsistent PAs, WS^2 employs a learning-based selection to select effective PAs and a novel region integrity criterion as a stopping condition for weakly-supervised training. In addition, a 3D-Conv GCAM is devised to adapt to the VAAS task. Extensive experiments show that WS^2 achieves state-of-the-art performance on both weakly-supervised VOS and VAAS tasks and is on par with the best fully-supervised method on VAAS.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes