CVMay 9, 2023

EFE: End-to-end Frame-to-Gaze Estimation

arXiv:2305.05526v128 citations
Originality Incremental advance
AI Analysis

This addresses the implementation-specific and costly preprocessing steps in gaze estimation for applications like human-computer interaction.

The paper tackles the problem of expensive and error-prone eye/face cropping in gaze estimation by proposing an end-to-end method that directly predicts 3D gaze from raw frames, achieving comparable results to state-of-the-art methods on three public datasets.

Despite the recent development of learning-based gaze estimation methods, most methods require one or more eye or face region crops as inputs and produce a gaze direction vector as output. Cropping results in a higher resolution in the eye regions and having fewer confounding factors (such as clothing and hair) is believed to benefit the final model performance. However, this eye/face patch cropping process is expensive, erroneous, and implementation-specific for different methods. In this paper, we propose a frame-to-gaze network that directly predicts both 3D gaze origin and 3D gaze direction from the raw frame out of the camera without any face or eye cropping. Our method demonstrates that direct gaze regression from the raw downscaled frame, from FHD/HD to VGA/HVGA resolution, is possible despite the challenges of having very few pixels in the eye region. The proposed method achieves comparable results to state-of-the-art methods in Point-of-Gaze (PoG) estimation on three public gaze datasets: GazeCapture, MPIIFaceGaze, and EVE, and generalizes well to extreme camera view changes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes