CVFeb 2, 2017

Video Salient Object Detection via Fully Convolutional Networks

arXiv:1702.00871v3605 citations
AI Analysis

It improves video salient object detection for applications like video editing and surveillance, but is incremental as it builds on existing fully convolutional networks.

The paper tackles the problem of detecting salient regions in videos by proposing a deep learning model that addresses data scarcity and speed issues, achieving state-of-the-art results with MAE of 0.06 on DAVIS and 0.07 on FBMS datasets at 2fps.

This paper proposes a deep learning model to efficiently detect salient regions in videos. It addresses two important issues: (1) deep video saliency model training with the absence of sufficiently large and pixel-wise annotated video data, and (2) fast video saliency training and detection. The proposed deep video saliency network consists of two modules, for capturing the spatial and temporal saliency information, respectively. The dynamic saliency model, explicitly incorporating saliency estimates from the static saliency model, directly produces spatiotemporal saliency inference without time-consuming optical flow computation. We further propose a novel data augmentation technique that simulates video training data from existing annotated image datasets, which enables our network to learn diverse saliency information and prevents overfitting with the limited number of training videos. Leveraging our synthetic video data (150K video sequences) and real videos, our deep video saliency model successfully learns both spatial and temporal saliency cues, thus producing accurate spatiotemporal saliency estimate. We advance the state-of-the-art on the DAVIS dataset (MAE of .06) and the FBMS dataset (MAE of .07), and do so with much improved speed (2fps with all steps).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes