CVNov 24, 2017

MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation

Xucong Zhang, Yusuke Sugano, Mario Fritz, Andreas Bulling

arXiv:1711.09017v127.6576 citations

Originality Incremental advance

AI Analysis

This work addresses gaze estimation for human-computer interaction by providing a more realistic dataset and a novel deep learning method, though it is incremental in advancing existing approaches.

The authors tackled the problem of unconstrained gaze estimation from monocular RGB images by introducing MPIIGaze, a real-world dataset collected during everyday laptop use, and GazeNet, a deep appearance-based method that improved state-of-the-art cross-dataset performance by 22%, reducing mean error from 13.9 to 10.8 degrees.

Learning-based methods are believed to work well for unconstrained gaze estimation, i.e. gaze estimation from a monocular RGB camera without assumptions regarding user, environment, or camera. However, current gaze datasets were collected under laboratory conditions and methods were not evaluated across multiple datasets. Our work makes three contributions towards addressing these limitations. First, we present the MPIIGaze that contains 213,659 full face images and corresponding ground-truth gaze positions collected from 15 users during everyday laptop use over several months. An experience sampling approach ensured continuous gaze and head poses and realistic variation in eye appearance and illumination. To facilitate cross-dataset evaluations, 37,667 images were manually annotated with eye corners, mouth corners, and pupil centres. Second, we present an extensive evaluation of state-of-the-art gaze estimation methods on three current datasets, including MPIIGaze. We study key challenges including target gaze range, illumination conditions, and facial appearance variation. We show that image resolution and the use of both eyes affect gaze estimation performance while head pose and pupil centre information are less informative. Finally, we propose GazeNet, the first deep appearance-based gaze estimation method. GazeNet improves the state of the art by 22% percent (from a mean error of 13.9 degrees to 10.8 degrees) for the most challenging cross-dataset evaluation.

View on arXiv PDF

Similar