Visual Saliency Model using SIFT and Comparison of Learning Approaches
This work addresses the challenge of computer modeling of human visual attention, which is incremental as it builds on existing feature sets by adding SIFT and comparing learning approaches.
The study tackled the problem of predicting human visual saliency in images by using SIFT features alongside traditional low, medium, and high-level features, and compared various machine learning methods to determine the best combination for improved classification accuracy on a large eye-tracking dataset.
Humans' ability to detect and locate salient objects on images is remarkably fast and successful. Performing this process by using eye tracking equipment is expensive and cannot be easily applied, and computer modeling of this human behavior is still a problem to be solved. In our study, one of the largest public eye-tracking databases which has fixation points of 15 observers on 1003 images is used. In addition to low, medium and high-level features which have been used in previous studies, SIFT features extracted from the images are used to improve the classification accuracy of the models. A second contribution of this paper is the comparison and statistical analysis of different machine learning methods that can be used to train our model. As a result, a best feature set and learning model to predict where humans look at images, is determined.