TPP-Gaze: Modelling Gaze Dynamics in Space and Time with Neural Temporal Point Processes
This work addresses the challenge of predicting when and where people look, which is important for applications in human-computer interaction and vision science, representing a novel integration of methods rather than an incremental improvement.
The paper tackles the problem of jointly modeling the spatial and temporal dynamics of visual attention in gaze scanpaths, achieving superior performance compared to state-of-the-art approaches across five datasets.
Attention guides our gaze to fixate the proper location of the scene and holds it in that location for the deserved amount of time given current processing demands, before shifting to the next one. As such, gaze deployment crucially is a temporal process. Existing computational models have made significant strides in predicting spatial aspects of observer's visual scanpaths (where to look), while often putting on the background the temporal facet of attention dynamics (when). In this paper we present TPP-Gaze, a novel and principled approach to model scanpath dynamics based on Neural Temporal Point Process (TPP), that jointly learns the temporal dynamics of fixations position and duration, integrating deep learning methodologies with point process theory. We conduct extensive experiments across five publicly available datasets. Our results show the overall superior performance of the proposed model compared to state-of-the-art approaches. Source code and trained models are publicly available at: https://github.com/phuselab/tppgaze.