General Methods Make Great Domain-specific Foundation Models: A Case-study on Fetal Ultrasound
This work addresses the problem of efficient model development for medical imaging, specifically fetal ultrasound, by showing that existing methods can be effective without novel innovations, which is incremental but practical for resource-constrained settings.
The study tackled whether to pretrain custom foundation models on medical data or use transfer learning from generalist models, finding that pretraining on a large fetal ultrasound dataset of 2M images with the DINOv2 method achieved state-of-the-art results on three datasets across classification, segmentation, and few-shot tasks.
With access to large-scale, unlabeled medical datasets, researchers are confronted with two questions: Should they attempt to pretrain a custom foundation model on this medical data, or use transfer-learning from an existing generalist model? And, if a custom model is pretrained, are novel methods required? In this paper we explore these questions by conducting a case-study, in which we train a foundation model on a large regional fetal ultrasound dataset of 2M images. By selecting the well-established DINOv2 method for pretraining, we achieve state-of-the-art results on three fetal ultrasound datasets, covering data from different countries, classification, segmentation, and few-shot tasks. We compare against a series of models pretrained on natural images, ultrasound images, and supervised baselines. Our results demonstrate two key insights: (i) Pretraining on custom data is worth it, even if smaller models are trained on less data, as scaling in natural image pretraining does not translate to ultrasound performance. (ii) Well-tuned methods from computer vision are making it feasible to train custom foundation models for a given medical domain, requiring no hyperparameter tuning and little methodological adaptation. Given these findings, we argue that a bias towards methodological innovation should be avoided when developing domain specific foundation models under common computational resource constraints.