CVNov 13, 2024

Zero-shot capability of SAM-family models for bone segmentation in CT scans

Caroline Magg, Hoel Kervadec, Clara I. Sánchez

arXiv:2411.08629v13.71 citationsh-index: 12

Originality Synthesis-oriented

AI Analysis

This work addresses a gap in understanding SAM-family model performance for bone segmentation in CT scans, providing guidelines for medical image segmentation, but it is incremental as it applies existing models to a new domain.

The study evaluated the zero-shot capability of SAM-family models for bone segmentation in CT scans using non-iterative prompting strategies, finding that SAM and SAM2 with bounding box and center point prompts performed best across three skeletal regions, with results varying by model type, size, and dataset characteristics.

The Segment Anything Model (SAM) and similar models build a family of promptable foundation models (FMs) for image and video segmentation. The object of interest is identified using prompts, such as bounding boxes or points. With these FMs becoming part of medical image segmentation, extensive evaluation studies are required to assess their strengths and weaknesses in clinical setting. Since the performance is highly dependent on the chosen prompting strategy, it is important to investigate different prompting techniques to define optimal guidelines that ensure effective use in medical image segmentation. Currently, no dedicated evaluation studies exist specifically for bone segmentation in CT scans, leaving a gap in understanding the performance for this task. Thus, we use non-iterative, ``optimal'' prompting strategies composed of bounding box, points and combinations to test the zero-shot capability of SAM-family models for bone CT segmentation on three different skeletal regions. Our results show that the best settings depend on the model type and size, dataset characteristics and objective to optimize. Overall, SAM and SAM2 prompted with a bounding box in combination with the center point for all the components of an object yield the best results across all tested settings. As the results depend on multiple factors, we provide a guideline for informed decision-making in 2D prompting with non-interactive, ''optimal'' prompts.

View on arXiv PDF

Similar