CVAIHCLGApr 15, 2024

EyeFormer: Predicting Personalized Scanpaths with Transformer-Guided Reinforcement Learning

arXiv:2404.10163v220 citationsh-index: 16UIST
Originality Incremental advance
AI Analysis

This work addresses the need for personalized scanpath prediction in graphical user interfaces, enabling applications like layout optimization, though it is incremental as it builds on existing attention prediction methods.

The authors tackled the problem of predicting personalized visual scanpaths for individuals, which existing models could not do, and introduced EyeFormer, a model that uses a Transformer-guided reinforcement learning approach to predict full scanpath information including fixation positions and durations, achieving personalized predictions with a few user samples.

From a visual perception perspective, modern graphical user interfaces (GUIs) comprise a complex graphics-rich two-dimensional visuospatial arrangement of text, images, and interactive objects such as buttons and menus. While existing models can accurately predict regions and objects that are likely to attract attention ``on average'', so far there is no scanpath model capable of predicting scanpaths for an individual. To close this gap, we introduce EyeFormer, which leverages a Transformer architecture as a policy network to guide a deep reinforcement learning algorithm that controls gaze locations. Our model has the unique capability of producing personalized predictions when given a few user scanpath samples. It can predict full scanpath information, including fixation positions and duration, across individuals and various stimulus types. Additionally, we demonstrate applications in GUI layout optimization driven by our model. Our software and models will be publicly available.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes