A generalizable 3D framework and model for self-supervised learning in medical imaging
This work addresses the need for more generalizable and scalable self-supervised learning models in medical imaging, enabling broader applications, though it appears incremental as it adapts existing SSL methods to 3D data.
The paper tackled the problem of limited generalizability and scalability in self-supervised learning for 3D medical imaging by introducing 3DINO and 3DINO-ViT, pretrained on a large dataset of ~100,000 scans, which outperformed state-of-the-art methods on most evaluation metrics across various tasks.
Current self-supervised learning methods for 3D medical imaging rely on simple pretext formulations and organ- or modality-specific datasets, limiting their generalizability and scalability. We present 3DINO, a cutting-edge SSL method adapted to 3D datasets, and use it to pretrain 3DINO-ViT: a general-purpose medical imaging model, on an exceptionally large, multimodal, and multi-organ dataset of ~100,000 3D medical imaging scans from over 10 organs. We validate 3DINO-ViT using extensive experiments on numerous medical imaging segmentation and classification tasks. Our results demonstrate that 3DINO-ViT generalizes across modalities and organs, including out-of-distribution tasks and datasets, outperforming state-of-the-art methods on the majority of evaluation metrics and labeled dataset sizes. Our 3DINO framework and 3DINO-ViT will be made available to enable research on 3D foundation models or further finetuning for a wide range of medical imaging applications.