IV CV LGJun 26, 2025

Exploring the Design Space of 3D MLLMs for CT Report Generation

Mohammed Baharoon, Jun Ma, Congyu Fang, Augustin Toma, Bo Wang

arXiv:2506.21535v28.61 citationsh-index: 8Has CodeMICCAI

Originality Incremental advance

AI Analysis

This work addresses the problem of automating radiology report generation for medical professionals, but it is incremental as it focuses on optimizing existing methods within a specific challenge.

The authors systematically explored design choices for 3D multimodal large language models to automate CT report generation, achieving up to a 10% improvement on the GREEN score and securing 2nd place in the MICCAI 2024 AMOS-MM challenge.

Multimodal Large Language Models (MLLMs) have emerged as a promising way to automate Radiology Report Generation (RRG). In this work, we systematically investigate the design space of 3D MLLMs, including visual input representation, projectors, Large Language Models (LLMs), and fine-tuning techniques for 3D CT report generation. We also introduce two knowledge-based report augmentation methods that improve performance on the GREEN score by up to 10%, achieving the 2nd place on the MICCAI 2024 AMOS-MM challenge. Our results on the 1,687 cases from the AMOS-MM dataset show that RRG is largely independent of the size of LLM under the same training protocol. We also show that larger volume size does not always improve performance if the original ViT was pre-trained on a smaller volume size. Lastly, we show that using a segmentation mask along with the CT volume improves performance. The code is publicly available at https://github.com/bowang-lab/AMOS-MM-Solution

View on arXiv PDF Code

Similar