CV HC LGApr 10, 2024

A Transformer-Based Model for the Prediction of Human Gaze Behavior on Videos

Suleyman Ozdel, Yao Rong, Berat Mert Albaba, Yen-Ling Kuo, Xi Wang, Enkelejda Kasneci

ETH Zurich

arXiv:2404.07351v16.57 citationsh-index: 6ETRA

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of automating video understanding using eye-tracking data, but it appears incremental as it builds on existing transformer and reinforcement learning techniques for a specific domain.

The paper tackled the problem of simulating human gaze behavior on videos to automate video analysis, introducing a transformer-based reinforcement learning method that effectively replicates gaze patterns and is applicable for downstream tasks like activity recognition.

Eye-tracking applications that utilize the human gaze in video understanding tasks have become increasingly important. To effectively automate the process of video analysis based on eye-tracking data, it is important to accurately replicate human gaze behavior. However, this task presents significant challenges due to the inherent complexity and ambiguity of human gaze patterns. In this work, we introduce a novel method for simulating human gaze behavior. Our approach uses a transformer-based reinforcement learning algorithm to train an agent that acts as a human observer, with the primary role of watching videos and simulating human gaze behavior. We employed an eye-tracking dataset gathered from videos generated by the VirtualHome simulator, with a primary focus on activity recognition. Our experimental results demonstrate the effectiveness of our gaze prediction method by highlighting its capability to replicate human gaze behavior and its applicability for downstream tasks where real human-gaze is used as input.

View on arXiv PDF

Similar