Clinical Utility of Foundation Segmentation Models in Musculoskeletal MRI: Biomarker Fidelity and Predictive Outcomes
This work addresses the need for reliable segmentation in musculoskeletal MRI to improve clinical decision-making, though it is incremental as it benchmarks existing models rather than introducing new ones.
The study evaluated three foundation segmentation models (SAM, SAM2, MedSAM) across eleven musculoskeletal MRI datasets, finding that finetuned models achieved consistent agreement with expert measurements for biomarkers like cartilage thickness and muscle volume, and demonstrated utility in predicting clinical outcomes such as total knee replacement.
Effective segmentation is fundamental for quantitative medical imaging; however, foundation segmentation models remain insufficiently evaluated for accuracy and biomarker fidelity across the diverse anatomical contexts and imaging protocols encountered in musculoskeletal (MSK) MRI. We evaluate three widely used segmentation models (SAM, SAM2, MedSAM) across eleven MSK MRI datasets spanning the knee, hip, spine, shoulder, and thigh. Our framework assesses both zero-shot and finetuned performance, with attention to segmentation accuracy, generalizability across imaging protocols, and reliability of derived quantitative biomarkers. Finetuned models showed consistent agreement with expert measurements for biomarkers including cartilage thickness, disc height, muscle volume, and compositional T1rho/T2 values. Automated prompting through the AutoLabel system enabled scalable segmentation, with moderate trade-offs in accuracy. As proof of concept, we applied the validated system to (i) a three-stage knee MRI triage cascade and (ii) a longitudinal landmark model that predicts total knee replacement and incident osteoarthritis. The framework offers a transparent method for benchmarking segmentation tools and connecting model performance to clinical imaging priorities.