CVLGDec 15, 2021

Gaze Estimation with Eye Region Segmentation and Self-Supervised Multistream Learning

arXiv:2112.07878v15 citations
Originality Highly original
AI Analysis

This work addresses gaze estimation for human-computer interaction, presenting an incremental improvement with novel components like segmentation and self-supervision.

The paper tackles gaze estimation by introducing a multistream network that uses eye region segmentation and self-supervised learning, achieving state-of-the-art results on the EYEDIAP dataset and outperforming all existing benchmarks.

We present a novel multistream network that learns robust eye representations for gaze estimation. We first create a synthetic dataset containing eye region masks detailing the visible eyeball and iris using a simulator. We then perform eye region segmentation with a U-Net type model which we later use to generate eye region masks for real-world eye images. Next, we pretrain an eye image encoder in the real domain with self-supervised contrastive learning to learn generalized eye representations. Finally, this pretrained eye encoder, along with two additional encoders for visible eyeball region and iris, are used in parallel in our multistream framework to extract salient features for gaze estimation from real-world images. We demonstrate the performance of our method on the EYEDIAP dataset in two different evaluation settings and achieve state-of-the-art results, outperforming all the existing benchmarks on this dataset. We also conduct additional experiments to validate the robustness of our self-supervised network with respect to different amounts of labeled data used for training.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes