Zero-shot System for Automatic Body Region Detection for Volumetric CT and MR Images
This addresses the need for reliable anatomical region identification in medical imaging workflows, offering a zero-shot solution that reduces dependency on unreliable metadata, though it is incremental as it builds on existing pre-trained models.
The paper tackled the problem of automatic body region detection in volumetric CT and MR images by proposing and evaluating three zero-shot pipelines, with the segmentation-driven rule-based approach achieving weighted F1-scores of 0.947 for CT and 0.914 for MR.
Reliable identification of anatomical body regions is a prerequisite for many automated medical imaging workflows, yet existing solutions remain heavily dependent on unreliable DICOM metadata. Current solutions mainly use supervised learning, which limits their applicability in many real-world scenarios. In this work, we investigate whether body region detection in volumetric CT and MR images can be achieved in a fully zero-shot manner by using knowledge embedded in large pre-trained foundation models. We propose and systematically evaluate three training-free pipelines: (1) a segmentation-driven rule-based system leveraging pre-trained multi-organ segmentation models, (2) a Multimodal Large Language Model (MLLM) guided by radiologist-defined rules, and (3) a segmentation-aware MLLM that combines visual input with explicit anatomical evidence. All methods are evaluated on 887 heterogeneous CT and MR scans with manually verified anatomical region labels. The segmentation-driven rule-based approach achieves the strongest and most consistent performance, with weighted F1-scores of 0.947 (CT) and 0.914 (MR), demonstrating robustness across modalities and atypical scan coverage. The MLLM performs competitively in visually distinctive regions, while the segmentation-aware MLLM reveals fundamental limitations.