CVJul 16, 2020

Kernelized Memory Network for Video Object Segmentation

arXiv:2007.08270v1212 citations
Originality Highly original
AI Analysis

This work addresses a key limitation in semi-supervised video object segmentation for computer vision applications, offering a significant performance boost over existing methods.

The authors tackled the mismatch between non-local space-time memory networks and the predominantly local nature of video object segmentation by proposing a kernelized memory network (KMN), which achieved a 5% improvement on the DAVIS 2017 test-dev set and runs at 0.12 seconds per frame on DAVIS 2016.

Semi-supervised video object segmentation (VOS) is a task that involves predicting a target object in a video when the ground truth segmentation mask of the target object is given in the first frame. Recently, space-time memory networks (STM) have received significant attention as a promising solution for semi-supervised VOS. However, an important point is overlooked when applying STM to VOS. The solution (STM) is non-local, but the problem (VOS) is predominantly local. To solve the mismatch between STM and VOS, we propose a kernelized memory network (KMN). Before being trained on real videos, our KMN is pre-trained on static images, as in previous works. Unlike in previous works, we use the Hide-and-Seek strategy in pre-training to obtain the best possible results in handling occlusions and segment boundary extraction. The proposed KMN surpasses the state-of-the-art on standard benchmarks by a significant margin (+5% on DAVIS 2017 test-dev set). In addition, the runtime of KMN is 0.12 seconds per frame on the DAVIS 2016 validation set, and the KMN rarely requires extra computation, when compared with STM.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes