Unsupervised Learning of Eye Gaze Representation from the Web
This work addresses eye gaze estimation for computer vision applications, but it is incremental as it builds on existing unsupervised and feature-based techniques.
The paper tackles unsupervised learning of eye gaze representation by proposing Ize-Net, trained on a large web dataset of 154,251 images, and shows it learns a rich representation that can be fine-tuned for gaze estimation tasks.
Automatic eye gaze estimation has interested researchers for a while now. In this paper, we propose an unsupervised learning based method for estimating the eye gaze region. To train the proposed network "Ize-Net" in self-supervised manner, we collect a large `in the wild' dataset containing 1,54,251 images from the web. For the images in the database, we divide the gaze into three regions based on an automatic technique based on pupil-centers localization and then use a feature-based technique to determine the gaze region. The performance is evaluated on the Tablet Gaze and CAVE datasets by fine-tuning results of Ize-Net for the task of eye gaze estimation. The feature representation learned is also used to train traditional machine learning algorithms for eye gaze estimation. The results demonstrate that the proposed method learns a rich data representation, which can be efficiently fine-tuned for any eye gaze estimation dataset.