Jeff B. Pelz

h-index29

6papers

366citations

Novelty36%

AI Score26

Ranked #161,360 of 194,257 authors (top 83%)#52,087 in CV (top 88%)

6 Papers

7.3CVMay 4, 2022

EllSeg-Gen, towards Domain Generalization for head-mounted eyetracking

Rakshit S. Kothari, Reynold J. Bailey, Christopher Kanan et al.

The study of human gaze behavior in natural contexts requires algorithms for gaze estimation that are robust to a wide range of imaging conditions. However, algorithms often fail to identify features such as the iris and pupil centroid in the presence of reflective artifacts and occlusions. Previous work has shown that convolutional networks excel at extracting gaze features despite the presence of such artifacts. However, these networks often perform poorly on data unseen during training. This work follows the intuition that jointly training a convolutional network with multiple datasets learns a generalized representation of eye parts. We compare the performance of a single model trained with multiple datasets against a pool of models trained on individual datasets. Results indicate that models tested on datasets in which eye images exhibit higher appearance variability benefit from multiset training. In contrast, dataset-specific models generalize better onto eye images with lower appearance variability.

3.7CVMar 17, 2021

Semi-Supervised Learning for Eye Image Segmentation

Aayush K. Chaudhary, Prashnna K. Gyawali, Linwei Wang et al.

Recent advances in appearance-based models have shown improved eye tracking performance in difficult scenarios like occlusion due to eyelashes, eyelids or camera placement, and environmental reflections on the cornea and glasses. The key reason for the improvement is the accurate and robust identification of eye parts (pupil, iris, and sclera regions). The improved accuracy often comes at the cost of labeling an enormous dataset, which is complex and time-consuming. This work presents two semi-supervised learning frameworks to identify eye-parts by taking advantage of unlabeled images where labeled datasets are scarce. With these frameworks, leveraging the domain-specific augmentation and novel spatially varying transformations for image segmentation, we show improved performance on various test cases. For instance, for a model trained on just 48 labeled images, these frameworks achieved an improvement of 0.38% and 0.65% in segmentation performance over the baseline model, which is trained only with the labeled dataset.

1.2CVSep 20, 2020

$pi_t$- Enhancing the Precision of Eye Tracking using Iris Feature Motion Vectors

Aayush K. Chaudhary, Jeff B. Pelz

A new high-precision eye-tracking method has been demonstrated recently by tracking the motion of iris features rather than by exploiting pupil edges. While the method provides high precision, it suffers from temporal drift, an inability to track across blinks, and loss of texture matches in the presence of motion blur. In this work, we present a new methodology $pi_t$ to address these issues by optimally combining the information from both iris textures and pupil edges. With this method, we show an improvement in precision (S2S-RMS & STD) of at least 48% and 10% respectively while fixating a series of small targets and following a smoothly moving target. Further, we demonstrate the capability in the identification of microsaccades between targets separated by 0.2-degree.

14.0CVJul 19, 2020Code

EllSeg: An Ellipse Segmentation Framework for Robust Gaze Tracking

Rakshit S. Kothari, Aayush K. Chaudhary, Reynold J. Bailey et al.

Ellipse fitting, an essential component in pupil or iris tracking based video oculography, is performed on previously segmented eye parts generated using various computer vision techniques. Several factors, such as occlusions due to eyelid shape, camera position or eyelashes, frequently break ellipse fitting algorithms that rely on well-defined pupil or iris edge segments. In this work, we propose training a convolutional neural network to directly segment entire elliptical structures and demonstrate that such a framework is robust to occlusions and offers superior pupil and iris tracking performance (at least 10$\%$ and 24$\%$ increase in pupil and iris center detection rate respectively within a two-pixel error margin) compared to using standard eye parts segmentation for multiple publicly available synthetic segmentation datasets.

15.5CVOct 1, 2019Code

RITnet: Real-time Semantic Segmentation of the Eye for Gaze Tracking

Aayush K. Chaudhary, Rakshit Kothari, Manoj Acharya et al.

Accurate eye segmentation can improve eye-gaze estimation and support interactive computing based on visual attention; however, existing eye segmentation methods suffer from issues such as person-dependent accuracy, lack of robustness, and an inability to be run in real-time. Here, we present the RITnet model, which is a deep neural network that combines U-Net and DenseNet. RITnet is under 1 MB and achieves 95.3\% accuracy on the 2019 OpenEDS Semantic Segmentation challenge. Using a GeForce GTX 1080 Ti, RITnet tracks at $>$ 300Hz, enabling real-time gaze tracking applications. Pre-trained models and source code are available https://bitbucket.org/eye-ush/ritnet/.

15.0CVMay 9, 2019

Gaze-in-wild: A dataset for studying eye and head coordination in everyday activities

Rakshit Kothari, Zhizhuo Yang, Christopher Kanan et al.

The interaction between the vestibular and ocular system has primarily been studied in controlled environments. Consequently, off-the shelf tools for categorization of gaze events (e.g. fixations, pursuits, saccade) fail when head movements are allowed. Our approach was to collect a novel, naturalistic, and multimodal dataset of eye+head movements when subjects performed everyday tasks while wearing a mobile eye tracker equipped with an inertial measurement unit and a 3D stereo camera. This Gaze-in-the-Wild dataset (GW) includes eye+head rotational velocities (deg/s), infrared eye images and scene imagery (RGB+D). A portion was labelled by coders into gaze motion events with a mutual agreement of 0.72 sample based Cohen's $κ$. This labelled data was used to train and evaluate two machine learning algorithms, Random Forest and a Recurrent Neural Network model, for gaze event classification. Assessment involved the application of established and novel event based performance metrics. Classifiers achieve $\sim$90$\%$ human performance in detecting fixations and saccades but fall short (60$\%$) on detecting pursuit movements. Moreover, pursuit classification is far worse in the absence of head movement information. A subsequent analysis of feature significance in our best-performing model revealed a reliance upon absolute eye and head velocity, indicating that classification does not require spatial alignment of the head and eye tracking coordinate systems. The GW dataset, trained classifiers and evaluation metrics will be made publicly available with the intention of facilitating growth in the emerging area of head-free gaze event classification.