IV CVAug 3, 2024

Zero-Shot Surgical Tool Segmentation in Monocular Video Using Segment Anything Model 2

Ange Lou, Yamin Li, Yike Zhang, Robert F. Labadie, Jack Noble

arXiv:2408.01648v118.423 citationsh-index: 6Has Code

Originality Synthesis-oriented

AI Analysis

This addresses surgical tool segmentation for medical applications, but it is incremental as it applies an existing foundation model to a new domain.

The study evaluated the Segment Anything Model 2 (SAM 2) for zero-shot surgical tool segmentation in monocular video, finding it capable of segmenting various surgical videos but requiring additional prompts for new tools and facing challenges from surgical video specifics.

The Segment Anything Model 2 (SAM 2) is the latest generation foundation model for image and video segmentation. Trained on the expansive Segment Anything Video (SA-V) dataset, which comprises 35.5 million masks across 50.9K videos, SAM 2 advances its predecessor's capabilities by supporting zero-shot segmentation through various prompts (e.g., points, boxes, and masks). Its robust zero-shot performance and efficient memory usage make SAM 2 particularly appealing for surgical tool segmentation in videos, especially given the scarcity of labeled data and the diversity of surgical procedures. In this study, we evaluate the zero-shot video segmentation performance of the SAM 2 model across different types of surgeries, including endoscopy and microscopy. We also assess its performance on videos featuring single and multiple tools of varying lengths to demonstrate SAM 2's applicability and effectiveness in the surgical domain. We found that: 1) SAM 2 demonstrates a strong capability for segmenting various surgical videos; 2) When new tools enter the scene, additional prompts are necessary to maintain segmentation accuracy; and 3) Specific challenges inherent to surgical videos can impact the robustness of SAM 2.

View on arXiv PDF Code

Similar