Zero-Shot Segmentation of Eye Features Using the Segment Anything Model (SAM)
This addresses the problem of reducing manual annotation needs for gaze estimation datasets, though it is incremental as it applies an existing model to a new domain.
The study evaluated the Segment Anything Model (SAM) for zero-shot segmentation of eye features in virtual reality images, finding that with prompts like bounding boxes, it achieved performance comparable to specialized models, such as a 93.34% IoU for pupil segmentation.
The advent of foundation models signals a new era in artificial intelligence. The Segment Anything Model (SAM) is the first foundation model for image segmentation. In this study, we evaluate SAM's ability to segment features from eye images recorded in virtual reality setups. The increasing requirement for annotated eye-image datasets presents a significant opportunity for SAM to redefine the landscape of data annotation in gaze estimation. Our investigation centers on SAM's zero-shot learning abilities and the effectiveness of prompts like bounding boxes or point clicks. Our results are consistent with studies in other domains, demonstrating that SAM's segmentation effectiveness can be on-par with specialized models depending on the feature, with prompts improving its performance, evidenced by an IoU of 93.34% for pupil segmentation in one dataset. Foundation models like SAM could revolutionize gaze estimation by enabling quick and easy image segmentation, reducing reliance on specialized models and extensive manual annotation.