Efficiency in Real-time Webcam Gaze Tracking
This work addresses efficiency challenges in gaze tracking for practical applications like human-computer interaction, but it is incremental as it focuses on optimizing existing methods rather than introducing new paradigms.
The paper tackled the problem of improving efficiency in real-time webcam gaze tracking by evaluating computational speed versus accuracy for CNN inputs and calibration effort versus accuracy for screen calibration methods, finding that single eye input and geometric regression calibration offer the best trade-off.
Efficiency and ease of use are essential for practical applications of camera based eye/gaze-tracking. Gaze tracking involves estimating where a person is looking on a screen based on face images from a computer-facing camera. In this paper we investigate two complementary forms of efficiency in gaze tracking: 1. The computational efficiency of the system which is dominated by the inference speed of a CNN predicting gaze-vectors; 2. The usability efficiency which is determined by the tediousness of the mandatory calibration of the gaze-vector to a computer screen. To do so, we evaluate the computational speed/accuracy trade-off for the CNN and the calibration effort/accuracy trade-off for screen calibration. For the CNN, we evaluate the full face, two-eyes, and single eye input. For screen calibration, we measure the number of calibration points needed and evaluate three types of calibration: 1. pure geometry, 2. pure machine learning, and 3. hybrid geometric regression. Results suggest that a single eye input and geometric regression calibration achieve the best trade-off.