IV AI CVJul 18, 2025

OrthoInsight: Rib Fracture Diagnosis and Report Generation Based on Multi-Modal Large Models

Ningyong Wu, Jinzhi Wang, Wenhong Zhao, Chenzhan Yu, Zhigang Xiu, Duwei Dai

arXiv:2507.13993v22 citationsh-index: 2

Originality Incremental advance

AI Analysis

This addresses the time-consuming and error-prone manual interpretation of CT scans for rib fractures, providing support for radiologists, though it is incremental as it combines existing methods.

The study tackled automated rib fracture diagnosis and report generation from CT scans using OrthoInsight, a multi-modal framework integrating YOLOv9, a knowledge graph, and LLaVA, achieving an average score of 4.28 on key metrics and outperforming models like GPT-4 and Claude-3.

The growing volume of medical imaging data has increased the need for automated diagnostic tools, especially for musculoskeletal injuries like rib fractures, commonly detected via CT scans. Manual interpretation is time-consuming and error-prone. We propose OrthoInsight, a multi-modal deep learning framework for rib fracture diagnosis and report generation. It integrates a YOLOv9 model for fracture detection, a medical knowledge graph for retrieving clinical context, and a fine-tuned LLaVA language model for generating diagnostic reports. OrthoInsight combines visual features from CT images with expert textual data to deliver clinically useful outputs. Evaluated on 28,675 annotated CT images and expert reports, it achieves high performance across Diagnostic Accuracy, Content Completeness, Logical Coherence, and Clinical Guidance Value, with an average score of 4.28, outperforming models like GPT-4 and Claude-3. This study demonstrates the potential of multi-modal learning in transforming medical image analysis and providing effective support for radiologists.

View on arXiv PDF

Similar