Segment Anything in Light Fields for Real-Time Applications via Constrained Prompting
This work addresses the need for efficient and accurate segmentation in light fields for applications like object pose tracking, representing an incremental improvement by leveraging existing models with domain-specific constraints.
The paper tackled the problem of segmenting objects in light field images by adapting the Segment Anything Model 2 (SAM 2) to this domain without retraining, resulting in high-quality, view-consistent masks that outperform the SAM 2 video baseline and operate 7 times faster with real-time speed.
Segmented light field images can serve as a powerful representation in many of computer vision tasks exploiting geometry and appearance of objects, such as object pose tracking. In the light field domain, segmentation presents an additional objective of recognizing the same segment through all the views. Segment Anything Model 2 (SAM 2) allows producing semantically meaningful segments for monocular images and videos. However, using SAM 2 directly on light fields is highly ineffective due to unexploited constraints. In this work, we present a novel light field segmentation method that adapts SAM 2 to the light field domain without retraining or modifying the model. By utilizing the light field domain constraints, the method produces high quality and view-consistent light field masks, outperforming the SAM 2 video tracking baseline and working 7 times faster, with a real-time speed. We achieve this by exploiting the epipolar geometry cues to propagate the masks between the views, probing the SAM 2 latent space to estimate their occlusion, and further prompting SAM 2 for their refinement.