MedicoSAM: Towards foundation models for medical image segmentation
This work addresses the need for more efficient and adaptable segmentation tools in medical imaging, though it is incremental as it builds on an existing foundation model.
The authors tackled the challenge of adapting the Segment Anything foundation model for medical image segmentation by comparing finetuning strategies on a diverse dataset, finding that performance improved for interactive segmentation but not for semantic segmentation.
Medical image segmentation is an important analysis task in clinical practice and research. Deep learning has massively advanced the field, but current approaches are mostly based on models trained for a specific task. Training such models or adapting them to a new condition is costly due to the need for (manually) labeled data. The emergence of vision foundation models, especially Segment Anything, offers a path to universal segmentation for medical images, overcoming these issues. Here, we study how to improve Segment Anything for medical images by comparing different finetuning strategies on a large and diverse dataset. We evaluate the finetuned models on a wide range of interactive and (automatic) semantic segmentation tasks. We find that the performance can be clearly improved for interactive segmentation. However, semantic segmentation does not benefit from pretraining on medical images. Our best model, MedicoSAM, is publicly available at https://github.com/computational-cell-analytics/medico-sam. We show that it is compatible with existing tools for data annotation and believe that it will be of great practical value.