Benchmarking of Deep Learning Methods for Generic MRI Multi-Organ Abdominal Segmentation
This work addresses the challenge of generalizable MRI segmentation for medical imaging, providing a benchmarking framework and tools for researchers and practitioners, though it is incremental as it compares existing methods and introduces a hybrid approach.
The authors benchmarked three state-of-the-art deep learning models for multi-organ abdominal MRI segmentation and introduced ABDSynth, a model trained on CT data without real MRI images, finding that MRSegmentator performed best in accuracy and generalizability, while ABDSynth offered a viable alternative with lower annotation requirements.
Recent advances in deep learning have led to robust automated tools for segmentation of abdominal computed tomography (CT). Meanwhile, segmentation of magnetic resonance imaging (MRI) is substantially more challenging due to the inherent signal variability and the increased effort required for annotating training datasets. Hence, existing approaches are trained on limited sets of MRI sequences, which might limit their generalizability. To characterize the landscape of MRI abdominal segmentation tools, we present here a comprehensive benchmarking of the three state-of-the-art and open-source models: MRSegmentator, MRISegmentator-Abdomen, and TotalSegmentator MRI. Since these models are trained using labor-intensive manual annotation cycles, we also introduce and evaluate ABDSynth, a SynthSeg-based model purely trained on widely available CT segmentations (no real images). More generally, we assess accuracy and generalizability by leveraging three public datasets (not seen by any of the evaluated methods during their training), which span all major manufacturers, five MRI sequences, as well as a variety of subject conditions, voxel resolutions, and fields-of-view. Our results reveal that MRSegmentator achieves the best performance and is most generalizable. In contrast, ABDSynth yields slightly less accurate results, but its relaxed requirements in training data make it an alternative when the annotation budget is limited. The evaluation code and datasets are given for future benchmarking at https://github.com/deepakri201/AbdoBench, along with inference code and weights for ABDSynth.