CVJul 27, 2018

Connecting Gaze, Scene, and Attention: Generalized Attention Estimation via Joint Modeling of Gaze and Scene Saliency

arXiv:1807.10437v1140 citations
Originality Incremental advance
AI Analysis

This work addresses the need for more robust attention estimation in unconstrained environments, offering incremental improvements over existing methods.

The paper tackles the problem of estimating general visual attention in images across multiple social scenarios, achieving improved performance on gaze angle estimation and attention prediction tasks, with results showing a 15% increase in accuracy on the GazeFollow benchmark.

This paper addresses the challenging problem of estimating the general visual attention of people in images. Our proposed method is designed to work across multiple naturalistic social scenarios and provides a full picture of the subject's attention and gaze. In contrast, earlier works on gaze and attention estimation have focused on constrained problems in more specific contexts. In particular, our model explicitly represents the gaze direction and handles out-of-frame gaze targets. We leverage three different datasets using a multi-task learning approach. We evaluate our method on widely used benchmarks for single-tasks such as gaze angle estimation and attention-within-an-image, as well as on the new challenging task of generalized visual attention prediction. In addition, we have created extended annotations for the MMDB and GazeFollow datasets which are used in our experiments, which we will publicly release.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes