CVAIJun 9, 2022

GASP: Gated Attention For Saliency Prediction

arXiv:2206.04590v17 citationsh-index: 46
Originality Incremental advance
AI Analysis

This work addresses the challenge of more accurately modeling human attention in computer vision, which is incremental as it builds on existing dynamic saliency models by adding social features.

The paper tackled the problem of improving saliency prediction by integrating social cues like gaze direction and affect, achieving at least a 5% improvement in prediction-to-ground-truth correspondence compared to models without these cues.

Saliency prediction refers to the computational task of modeling overt attention. Social cues greatly influence our attention, consequently altering our eye movements and behavior. To emphasize the efficacy of such features, we present a neural model for integrating social cues and weighting their influences. Our model consists of two stages. During the first stage, we detect two social cues by following gaze, estimating gaze direction, and recognizing affect. These features are then transformed into spatiotemporal maps through image processing operations. The transformed representations are propagated to the second stage (GASP) where we explore various techniques of late fusion for integrating social cues and introduce two sub-networks for directing attention to relevant stimuli. Our experiments indicate that fusion approaches achieve better results for static integration methods, whereas non-fusion approaches for which the influence of each modality is unknown, result in better outcomes when coupled with recurrent models for dynamic saliency prediction. We show that gaze direction and affective representations contribute a prediction to ground-truth correspondence improvement of at least 5% compared to dynamic saliency models without social cues. Furthermore, affective representations improve GASP, supporting the necessity of considering affect-biased attention in predicting saliency.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes