CVJan 25, 2024Code
MambaMorph: a Mamba-based Framework for Medical MR-CT Deformable RegistrationTao Guo, Yinuo Wang, Shihao Shu et al.
Capturing voxel-wise spatial correspondence across distinct modalities is crucial for medical image analysis. However, current registration approaches are not practical enough in terms of registration accuracy and clinical applicability. In this paper, we introduce MambaMorph, a novel multi-modality deformable registration framework. Specifically, MambaMorph utilizes a Mamba-based registration module and a fine-grained, yet simple, feature extractor for efficient long-range correspondence modeling and high-dimensional feature learning, respectively. Additionally, we develop a well-annotated brain MR-CT registration dataset, SR-Reg, to address the scarcity of data in multi-modality registration. To validate MambaMorph's multi-modality registration capabilities, we conduct quantitative experiments on both our SR-Reg dataset and a public T1-T2 dataset. The experimental results on both datasets demonstrate that MambaMorph significantly outperforms the current state-of-the-art learning-based registration methods in terms of registration accuracy. Further study underscores the efficiency of the Mamba-based registration module and the lightweight feature extractor, which achieve notable registration quality while maintaining reasonable computational costs and speeds. We believe that MambaMorph holds significant potential for practical applications in medical image registration. The code for MambaMorph is available at: https://github.com/Guo-Stone/MambaMorph.
CVMay 14, 2025
Zero-Shot Multi-modal Large Language Model v.s. Supervised Deep Learning: A Comparative Study on CT-Based Intracranial Hemorrhage SubtypingYinuo Wang, Yue Zeng, Kai Chen et al.
Introduction: Timely identification of intracranial hemorrhage (ICH) subtypes on non-contrast computed tomography is critical for prognosis prediction and therapeutic decision-making, yet remains challenging due to low contrast and blurring boundaries. This study evaluates the performance of zero-shot multi-modal large language models (MLLMs) compared to traditional deep learning methods in ICH binary classification and subtyping. Methods: We utilized a dataset provided by RSNA, comprising 192 NCCT volumes. The study compares various MLLMs, including GPT-4o, Gemini 2.0 Flash, and Claude 3.5 Sonnet V2, with conventional deep learning models, including ResNet50 and Vision Transformer. Carefully crafted prompts were used to guide MLLMs in tasks such as ICH presence, subtype classification, localization, and volume estimation. Results: The results indicate that in the ICH binary classification task, traditional deep learning models outperform MLLMs comprehensively. For subtype classification, MLLMs also exhibit inferior performance compared to traditional deep learning models, with Gemini 2.0 Flash achieving an macro-averaged precision of 0.41 and a macro-averaged F1 score of 0.31. Conclusion: While MLLMs excel in interactive capabilities, their overall accuracy in ICH subtyping is inferior to deep networks. However, MLLMs enhance interpretability through language interactions, indicating potential in medical imaging analysis. Future efforts will focus on model refinement and developing more precise MLLMs to improve performance in three-dimensional medical image processing.