CVMar 30, 2020

Memory Aggregation Networks for Efficient Interactive Video Object Segmentation

arXiv:2003.13246v183 citations
AI Analysis

This work addresses the problem of computational inefficiency in interactive video object segmentation for video editing and analysis applications, representing an incremental improvement over existing methods.

The paper tackles the inefficiency of interactive video object segmentation by proposing a unified framework called Memory Aggregation Networks (MA-Net), which integrates interaction and propagation into a single network, achieving a J@60 score of 76.1% on the DAVIS 2018 benchmark and outperforming state-of-the-art methods by over 2.7%.

Interactive video object segmentation (iVOS) aims at efficiently harvesting high-quality segmentation masks of the target object in a video with user interactions. Most previous state-of-the-arts tackle the iVOS with two independent networks for conducting user interaction and temporal propagation, respectively, leading to inefficiencies during the inference stage. In this work, we propose a unified framework, named Memory Aggregation Networks (MA-Net), to address the challenging iVOS in a more efficient way. Our MA-Net integrates the interaction and the propagation operations into a single network, which significantly promotes the efficiency of iVOS in the scheme of multi-round interactions. More importantly, we propose a simple yet effective memory aggregation mechanism to record the informative knowledge from the previous interaction rounds, improving the robustness in discovering challenging objects of interest greatly. We conduct extensive experiments on the validation set of DAVIS Challenge 2018 benchmark. In particular, our MA-Net achieves the J@60 score of 76.1% without any bells and whistles, outperforming the state-of-the-arts with more than 2.7%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes