CVMar 27

Real-time Appearance-based Gaze Estimation for Open Domains

arXiv:2603.2694552.7h-index: 12
Predicted impact top 67% in CV · last 90 daysOriginality Incremental advance
AI Analysis

Enables robust, real-time gaze estimation on mobile devices for open-domain applications, addressing a practical bottleneck in unconstrained settings.

Existing appearance-based gaze estimation models fail in unconstrained scenarios like facial wearables and poor lighting. The proposed framework, using augmentation and multi-task learning, achieves SOTA-competitive generalization with <1% of UniGaze-H's parameters, enabling real-time mobile gaze tracking.

Appearance-based gaze estimation (AGE) has achieved remarkable performance in constrained settings, yet we reveal a significant generalization gap where existing AGE models often fail in practical, unconstrained scenarios, particularly those involving facial wearables and poor lighting conditions. We attribute this failure to two core factors: limited image diversity and inconsistent label fidelity across different datasets, especially along the pitch axis. To address these, we propose a robust AGE framework that enhances generalization without requiring additional human-annotated data. First, we expand the image manifold via an ensemble of augmentation techniques, including synthesis of eyeglasses, masks, and varied lighting. Second, to mitigate the impact of anisotropic inter-dataset label deviation, we reformulate gaze regression as a multi-task learning problem, incorporating multi-view supervised contrastive (SupCon) learning, discretized label classification, and eye-region segmentation as auxiliary objectives. To rigorously validate our approach, we curate new benchmark datasets designed to evaluate gaze robustness under challenging conditions, a dimension largely overlooked by existing evaluation protocols. Our MobileNet-based lightweight model achieves generalization performance competitive with the state-of-the-art (SOTA) UniGaze-H, while utilizing less than 1\% of its parameters, enabling high-fidelity, real-time gaze tracking on mobile devices.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes