Neural Networks for Semantic Gaze Analysis in XR Settings
This addresses the problem of economic use of eye-tracking data in VR/AR settings for researchers and developers, though it appears incremental as it builds on existing object recognition techniques.
The paper tackles the resource-intensive task of semantic gaze analysis in interactive 3D XR scenes by presenting a novel approach that uses CNNs trained on synthetic data to minimize annotation time and information needed for volumes of interest, showing it can compete with state-of-the-art methods without relying on markers or preexisting databases.
Virtual-reality (VR) and augmented-reality (AR) technology is increasingly combined with eye-tracking. This combination broadens both fields and opens up new areas of application, in which visual perception and related cognitive processes can be studied in interactive but still well controlled settings. However, performing a semantic gaze analysis of eye-tracking data from interactive three-dimensional scenes is a resource-intense task, which so far has been an obstacle to economic use. In this paper we present a novel approach which minimizes time and information necessary to annotate volumes of interest (VOIs) by using techniques from object recognition. To do so, we train convolutional neural networks (CNNs) on synthetic data sets derived from virtual models using image augmentation techniques. We evaluate our method in real and virtual environments, showing that the method can compete with state-of-the-art approaches, while not relying on additional markers or preexisting databases but instead offering cross-platform use.