Zero-Shot Pupil Segmentation with SAM 2: A Case Study of Over 14 Million Images
This work addresses the challenge of reducing annotation time and technical barriers for researchers and practitioners in eye tracking, though it is incremental as it applies an existing model to a new domain.
The paper tackled the problem of pupil segmentation for gaze estimation and eye tracking by applying SAM 2, a vision foundation model, to over 14 million eye images, achieving competitive mIoU scores of up to 93% without fine-tuning.
We explore the transformative potential of SAM 2, a vision foundation model, in advancing gaze estimation and eye tracking technologies. By significantly reducing annotation time, lowering technical barriers through its ease of deployment, and enhancing segmentation accuracy, SAM 2 addresses critical challenges faced by researchers and practitioners. Utilizing its zero-shot segmentation capabilities with minimal user input-a single click per video-we tested SAM 2 on over 14 million eye images from diverse datasets, including virtual reality setups and the world's largest unified dataset recorded using wearable eye trackers. Remarkably, in pupil segmentation tasks, SAM 2 matches the performance of domain-specific models trained solely on eye images, achieving competitive mean Intersection over Union (mIoU) scores of up to 93% without fine-tuning. Additionally, we provide our code and segmentation masks for these widely used datasets to promote further research.