CVFeb 4, 2018

Efficient Video Object Segmentation via Network Modulation

arXiv:1802.01218v1357 citations
Originality Incremental advance
AI Analysis

This addresses the speed bottleneck for real-time video object segmentation applications, though it is an incremental improvement over existing fine-tuning methods.

The paper tackles the inefficiency of fine-tuning deep learning models for video object segmentation by introducing a meta neural network called a modulator that adapts the segmentation model in a single forward pass, achieving similar accuracy while being 70 times faster.

Video object segmentation targets at segmenting a specific object throughout a video sequence, given only an annotated first frame. Recent deep learning based approaches find it effective by fine-tuning a general-purpose segmentation model on the annotated frame using hundreds of iterations of gradient descent. Despite the high accuracy these methods achieve, the fine-tuning process is inefficient and fail to meet the requirements of real world applications. We propose a novel approach that uses a single forward pass to adapt the segmentation model to the appearance of a specific object. Specifically, a second meta neural network named modulator is learned to manipulate the intermediate layers of the segmentation network given limited visual and spatial information of the target object. The experiments show that our approach is 70times faster than fine-tuning approaches while achieving similar accuracy.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes