Automating MedSAM by Learning Prompts with Weak Few-Shot Supervision
This work addresses the problem of reducing user interaction and annotation costs for deploying foundation models like MedSAM in medical imaging applications.
This paper tackles the problem of automating MedSAM, a foundation model for medical image segmentation, by replacing manual prompts with a lightweight module that learns prompt embeddings directly from image embeddings. The method achieves this automation using weak labels (tight bounding boxes) and few-shot supervision (10 samples) across three medical datasets in MR and ultrasound imaging.
Foundation models such as the recently introduced Segment Anything Model (SAM) have achieved remarkable results in image segmentation tasks. However, these models typically require user interaction through handcrafted prompts such as bounding boxes, which limits their deployment to downstream tasks. Adapting these models to a specific task with fully labeled data also demands expensive prior user interaction to obtain ground-truth annotations. This work proposes to replace conditioning on input prompts with a lightweight module that directly learns a prompt embedding from the image embedding, both of which are subsequently used by the foundation model to output a segmentation mask. Our foundation models with learnable prompts can automatically segment any specific region by 1) modifying the input through a prompt embedding predicted by a simple module, and 2) using weak labels (tight bounding boxes) and few-shot supervision (10 samples). Our approach is validated on MedSAM, a version of SAM fine-tuned for medical images, with results on three medical datasets in MR and ultrasound imaging. Our code is available on https://github.com/Minimel/MedSAMWeakFewShotPromptAutomation.