Visual Attention driven by Convolutional Features
This addresses the problem of predicting where humans look in scenes for computer vision applications, but it is incremental as it builds on existing deep learning and eye-movement models.
The paper tackled predicting human visual attention by using deep convolutional neural networks trained for object classification to generate saliency maps and integrating them with a bottom-up model to simulate scanpaths, achieving results that demonstrate effectiveness in saliency prediction and similarity scores with human scanpaths.
The understanding of where humans look in a scene is a problem of great interest in visual perception and computer vision. When eye-tracking devices are not a viable option, models of human attention can be used to predict fixations. In this paper we give two contribution. First, we show a model of visual attention that is simply based on deep convolutional neural networks trained for object classification tasks. A method for visualizing saliency maps is defined which is evaluated in a saliency prediction task. Second, we integrate the information of these maps with a bottom-up differential model of eye-movements to simulate visual attention scanpaths. Results on saliency prediction and scores of similarity with human scanpaths demonstrate the effectiveness of this model.