CVAug 18, 2023

Investigation of Architectures and Receptive Fields for Appearance-based Gaze Estimation

arXiv:2308.09593v13 citationsh-index: 32
Originality Incremental advance
AI Analysis

This work addresses gaze estimation for computer vision and human-computer interaction, but it is incremental as it focuses on optimizing existing architectures rather than introducing new paradigms.

The paper tackles the problem of appearance-based gaze estimation by showing that tuning simple parameters in a ResNet architecture outperforms most existing state-of-the-art methods, achieving gaze estimation errors of 3.64 degrees on ETH-XGaze, 4.50 degrees on MPIIFaceGaze, and 9.13 degrees on Gaze360.

With the rapid development of deep learning technology in the past decade, appearance-based gaze estimation has attracted great attention from both computer vision and human-computer interaction research communities. Fascinating methods were proposed with variant mechanisms including soft attention, hard attention, two-eye asymmetry, feature disentanglement, rotation consistency, and contrastive learning. Most of these methods take the single-face or multi-region as input, yet the basic architecture of gaze estimation has not been fully explored. In this paper, we reveal the fact that tuning a few simple parameters of a ResNet architecture can outperform most of the existing state-of-the-art methods for the gaze estimation task on three popular datasets. With our extensive experiments, we conclude that the stride number, input image resolution, and multi-region architecture are critical for the gaze estimation performance while their effectiveness dependent on the quality of the input face image. We obtain the state-of-the-art performances on three datasets with 3.64 on ETH-XGaze, 4.50 on MPIIFaceGaze, and 9.13 on Gaze360 degrees gaze estimation error by taking ResNet-50 as the backbone.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes