CVMar 12, 2018

Video Object Segmentation with Joint Re-identification and Attention-Aware Mask Propagation

arXiv:1803.04242v2206 citations
Originality Incremental advance
AI Analysis

This work addresses video object segmentation for computer vision applications, offering incremental improvements in handling occlusions and appearance changes.

The paper tackles the challenge of segmenting and tracking multiple objects in video, especially during occlusions, by developing a deep recurrent network that combines temporal propagation and re-identification. It achieves a state-of-the-art global mean score of 68.2 on the DAVIS 2017 benchmark, outperforming the previous best of 66.1.

The problem of video object segmentation can become extremely challenging when multiple instances co-exist. While each instance may exhibit large scale and pose variations, the problem is compounded when instances occlude each other causing failures in tracking. In this study, we formulate a deep recurrent network that is capable of segmenting and tracking objects in video simultaneously by their temporal continuity, yet able to re-identify them when they re-appear after a prolonged occlusion. We combine both temporal propagation and re-identification functionalities into a single framework that can be trained end-to-end. In particular, we present a re-identification module with template expansion to retrieve missing objects despite their large appearance changes. In addition, we contribute a new attention-based recurrent mask propagation approach that is robust to distractors not belonging to the target segment. Our approach achieves a new state-of-the-art global mean (Region Jaccard and Boundary F measure) of 68.2 on the challenging DAVIS 2017 benchmark (test-dev set), outperforming the winning solution which achieves a global mean of 66.1 on the same partition.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes