CVMar 27, 2023

Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention

arXiv:2303.15274v359 citationsh-index: 66
Originality Highly original
AI Analysis

This addresses scalability and speed limitations in gaze prediction for HCI applications, though it is incremental in improving existing methods.

The paper tackles the problem of predicting human gaze for goal-directed attention in HCI by introducing Gazeformer, a model that uses natural language encoding instead of object detectors, achieving large performance gains and being over five times faster than the state-of-the-art.

Predicting human gaze is important in Human-Computer Interaction (HCI). However, to practically serve HCI applications, gaze prediction models must be scalable, fast, and accurate in their spatial and temporal gaze predictions. Recent scanpath prediction models focus on goal-directed attention (search). Such models are limited in their application due to a common approach relying on trained target detectors for all possible objects, and the availability of human gaze data for their training (both not scalable). In response, we pose a new task called ZeroGaze, a new variant of zero-shot learning where gaze is predicted for never-before-searched objects, and we develop a novel model, Gazeformer, to solve the ZeroGaze problem. In contrast to existing methods using object detector modules, Gazeformer encodes the target using a natural language model, thus leveraging semantic similarities in scanpath prediction. We use a transformer-based encoder-decoder architecture because transformers are particularly useful for generating contextual representations. Gazeformer surpasses other models by a large margin on the ZeroGaze setting. It also outperforms existing target-detection models on standard gaze prediction for both target-present and target-absent search tasks. In addition to its improved performance, Gazeformer is more than five times faster than the state-of-the-art target-present visual search model.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes