An Element Sensitive Saliency Model with Position Prior Learning for Web Pages
This work addresses saliency prediction for web pages, a domain-specific problem with applications in multimedia, but it is incremental as it builds on existing methods for eye-tracking data.
The paper tackled the problem of predicting visual attention on web pages, which have diverse content and layouts, by proposing a deep generative saliency model with position prior learning and element-sensitive branches, and it outperformed state-of-the-art models on the FiWI dataset.
Understanding human visual attention is important for multimedia applications. Many studies have attempted to learn from eye-tracking data and build computational saliency prediction models. However, limited efforts have been devoted to saliency prediction for Web pages, which are characterized by more diverse content elements and spatial layouts. In this paper, we propose a novel end-to-end deep generative saliency model for Web pages. To capture position biases introduced by page layouts, a Position Prior Learning sub-network is proposed, which models position biases as multivariate Gaussian distribution using variational auto-encoder. To model different elements of a Web page, a Multi Discriminative Region Detection (MDRD) branch and a Text Region Detection(TRD) branch are introduced, which target to extract discriminative localizations and "prominent" text regions likely to correspond to human attention, respectively. We validate the proposed model with FiWI, a public Web-page dataset, and shows that the proposed model outperforms the state-of-art models for Web-page saliency prediction.