CVMar 31, 2024

M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language Models

arXiv:2404.00578v1156 citationsh-index: 9Has Code
Originality Incremental advance
AI Analysis

It addresses the problem of limited 3D medical image analysis tools for clinicians and researchers, though it is incremental as it extends existing multi-modal large language models to 3D data.

This paper tackles the under-exploration of 3D medical image analysis by introducing a large-scale dataset (M3D-Data with 120K image-text pairs and 662K instruction-response pairs), a versatile model (M3D-LaMed), and a new benchmark (M3D-Bench), resulting in a robust model that outperforms existing solutions.

Medical image analysis is essential to clinical diagnosis and treatment, which is increasingly supported by multi-modal large language models (MLLMs). However, previous research has primarily focused on 2D medical images, leaving 3D images under-explored, despite their richer spatial information. This paper aims to advance 3D medical image analysis with MLLMs. To this end, we present a large-scale 3D multi-modal medical dataset, M3D-Data, comprising 120K image-text pairs and 662K instruction-response pairs specifically tailored for various 3D medical tasks, such as image-text retrieval, report generation, visual question answering, positioning, and segmentation. Additionally, we propose M3D-LaMed, a versatile multi-modal large language model for 3D medical image analysis. Furthermore, we introduce a new 3D multi-modal medical benchmark, M3D-Bench, which facilitates automatic evaluation across eight tasks. Through comprehensive evaluation, our method proves to be a robust model for 3D medical image analysis, outperforming existing solutions. All code, data, and models are publicly available at: https://github.com/BAAI-DCAI/M3D.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes