CVJun 23, 2023

How to Efficiently Adapt Large Segmentation Model(SAM) to Medical Images

arXiv:2306.13731v196 citationsh-index: 18Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficiently adapting large segmentation models to medical imaging, which is crucial for making SAM a foundation model in computer vision, though it is incremental as it builds on existing SAM architecture.

The paper tackles the performance drop of the Segment Anything (SAM) model on medical images by proposing to freeze SAM's encoder and finetune lightweight, prompt-free prediction heads (ViT, CNN, linear). The result shows significant improvement in segmentation accuracy on a public medical dataset, even with just one labeled volume, outperforming training from scratch and self-supervised methods when annotations are limited.

The emerging scale segmentation model, Segment Anything (SAM), exhibits impressive capabilities in zero-shot segmentation for natural images. However, when applied to medical images, SAM suffers from noticeable performance drop. To make SAM a real ``foundation model" for the computer vision community, it is critical to find an efficient way to customize SAM for medical image dataset. In this work, we propose to freeze SAM encoder and finetune a lightweight task-specific prediction head, as most of weights in SAM are contributed by the encoder. In addition, SAM is a promptable model, while prompt is not necessarily available in all application cases, and precise prompts for multiple class segmentation are also time-consuming. Therefore, we explore three types of prompt-free prediction heads in this work, include ViT, CNN, and linear layers. For ViT head, we remove the prompt tokens in the mask decoder of SAM, which is named AutoSAM. AutoSAM can also generate masks for different classes with one single inference after modification. To evaluate the label-efficiency of our finetuning method, we compare the results of these three prediction heads on a public medical image segmentation dataset with limited labeled data. Experiments demonstrate that finetuning SAM significantly improves its performance on medical image dataset, even with just one labeled volume. Moreover, AutoSAM and CNN prediction head also has better segmentation accuracy than training from scratch and self-supervised learning approaches when there is a shortage of annotations.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes