CVMar 11, 2016

Learning Gaze Transitions from Depth to Improve Video Saliency Estimation

arXiv:1603.03669v145 citations
Originality Incremental advance
AI Analysis

This addresses the problem of predicting human attention in 3D video content for applications where 3D displays are impractical, though it is incremental as it builds on existing video saliency methods by adding depth.

The paper tackles video saliency estimation for RGBD videos on 2D screens by training a generative CNN that predicts saliency maps using previous frame fixations, achieving a 15% relative improvement over state-of-the-art methods.

In this paper we introduce a novel Depth-Aware Video Saliency approach to predict human focus of attention when viewing RGBD videos on regular 2D screens. We train a generative convolutional neural network which predicts a saliency map for a frame, given the fixation map of the previous frame. Saliency estimation in this scenario is highly important since in the near future 3D video content will be easily acquired and yet hard to display. This can be explained, on the one hand, by the dramatic improvement of 3D-capable acquisition equipment. On the other hand, despite the considerable progress in 3D display technologies, most of the 3D displays are still expensive and require wearing special glasses. To evaluate the performance of our approach, we present a new comprehensive database of eye-fixation ground-truth for RGBD videos. Our experiments indicate that integrating depth into video saliency calculation is beneficial. We demonstrate that our approach outperforms state-of-the-art methods for video saliency, achieving 15% relative improvement.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes