A Reverse Hierarchy Model for Predicting Eye Fixations
This work addresses saliency detection for computer vision applications, but it is incremental as it builds on existing theories and achieves only competitive rather than superior performance.
The authors tackled the problem of predicting eye fixations in images by developing a computational model inspired by the reverse hierarchy theory, achieving competitive results with state-of-the-art models on two standard eye-tracking datasets.
A number of psychological and physiological evidences suggest that early visual attention works in a coarse-to-fine way, which lays a basis for the reverse hierarchy theory (RHT). This theory states that attention propagates from the top level of the visual hierarchy that processes gist and abstract information of input, to the bottom level that processes local details. Inspired by the theory, we develop a computational model for saliency detection in images. First, the original image is downsampled to different scales to constitute a pyramid. Then, saliency on each layer is obtained by image super-resolution reconstruction from the layer above, which is defined as unpredictability from this coarse-to-fine reconstruction. Finally, saliency on each layer of the pyramid is fused into stochastic fixations through a probabilistic model, where attention initiates from the top layer and propagates downward through the pyramid. Extensive experiments on two standard eye-tracking datasets show that the proposed method can achieve competitive results with state-of-the-art models.