CVJul 5, 2021

Gaze Estimation with an Ensemble of Four Architectures

arXiv:2107.01980v117 citations
Originality Synthesis-oriented
AI Analysis

This work addresses gaze estimation for computer vision applications, but it is incremental as it combines existing architectures without introducing a new method.

The paper tackled gaze estimation from face images by ensembling predictions from six estimators based on four network architectures, achieving first place in the ETH-XGaze Competition with an average angular error of 3.11°.

This paper presents a method for gaze estimation according to face images. We train several gaze estimators adopting four different network architectures, including an architecture designed for gaze estimation (i.e.,iTracker-MHSA) and three originally designed for general computer vision tasks(i.e., BoTNet, HRNet, ResNeSt). Then, we select the best six estimators and ensemble their predictions through a linear combination. The method ranks the first on the leader-board of ETH-XGaze Competition, achieving an average angular error of $3.11^{\circ}$ on the ETH-XGaze test set.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes