IVCVAug 3, 2024

Zero-Shot Surgical Tool Segmentation in Monocular Video Using Segment Anything Model 2

arXiv:2408.01648v123 citationsh-index: 6
Originality Synthesis-oriented
AI Analysis

This addresses surgical tool segmentation for medical applications, but it is incremental as it applies an existing foundation model to a new domain.

The study evaluated the Segment Anything Model 2 (SAM 2) for zero-shot surgical tool segmentation in monocular video, finding it capable of segmenting various surgical videos but requiring additional prompts for new tools and facing challenges from surgical video specifics.

The Segment Anything Model 2 (SAM 2) is the latest generation foundation model for image and video segmentation. Trained on the expansive Segment Anything Video (SA-V) dataset, which comprises 35.5 million masks across 50.9K videos, SAM 2 advances its predecessor's capabilities by supporting zero-shot segmentation through various prompts (e.g., points, boxes, and masks). Its robust zero-shot performance and efficient memory usage make SAM 2 particularly appealing for surgical tool segmentation in videos, especially given the scarcity of labeled data and the diversity of surgical procedures. In this study, we evaluate the zero-shot video segmentation performance of the SAM 2 model across different types of surgeries, including endoscopy and microscopy. We also assess its performance on videos featuring single and multiple tools of varying lengths to demonstrate SAM 2's applicability and effectiveness in the surgical domain. We found that: 1) SAM 2 demonstrates a strong capability for segmenting various surgical videos; 2) When new tools enter the scene, additional prompts are necessary to maintain segmentation accuracy; and 3) Specific challenges inherent to surgical videos can impact the robustness of SAM 2.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes