CVSep 3, 2018

Learning Saliency Prediction From Sparse Fixation Pixel Map

arXiv:1809.00644v1
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in computer vision for saliency prediction by introducing a novel training approach, though it is incremental in scope.

The paper tackles the problem of saliency prediction by learning directly from sparse fixation pixel maps instead of blurred fixation blob maps, achieving competitive performance with state-of-the-art methods on multiple benchmark datasets.

Ground truth for saliency prediction datasets consists of two types of map data: fixation pixel map which records the human eye movements on sample images, and fixation blob map generated by performing gaussian blurring on the corresponding fixation pixel map. Current saliency approaches perform prediction by directly pixel-wise regressing the input image into saliency map with fixation blob as ground truth, yet learning saliency from fixation pixel map is not explored. In this work, we propose a first-of-its-kind approach of learning saliency prediction from sparse fixation pixel map, and a novel loss function for training from such sparse fixation. We utilize clustering to extract sparse fixation pixel from the raw fixation pixel map, and add a max-pooling transformation on the output to avoid false penalty between sparse outputs and labels caused by nearby but non-overlapping saliency pixels when calculating loss. This approach provides a novel perspective for achieving saliency prediction. We evaluate our approach over multiple benchmark datasets, and achieve competitive performance in terms of multiple metrics comparing with state-of-the-art saliency methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes