MedTVT-R1: A Multimodal LLM Empowering Medical Reasoning and Diagnosis
This addresses the problem of limited diagnostic accuracy in medical research for clinicians and patients, but it appears incremental as it builds on existing multimodal LLM approaches with specific enhancements.
The paper tackles the challenge of accurate multi-disease diagnosis using multimodal medical data by proposing MedTVT-R1, a multimodal LLM framework that integrates clinical data for reasoning and diagnosis, demonstrating superiority in multimodal feature utilization and multi-disease diagnosis with potential for clinical applications.
Accurate and interpretable multi-disease diagnosis remains a critical challenge in medical research, particularly when leveraging heterogeneous multimodal medical data. Current approaches often rely on single-modal data, limiting their ability to comprehensively understand complex diseases. To address this, we propose MedTVT-R1, a novel Multimodal Large Language Model (MLLM) framework designed to integrate clinical multimodal data for reasoning and diagnosing multiple diseases. We construct MedTVT-QA, a curated instruction dataset that provides question-answer pairs for physiological-level interpretations and disease-level diagnoses with a Chain of Evidence approach. MedTVT-R1 incorporates a modality perception layer to capture inter-modal dependencies and adaptively weight modality contributions. Additionally, we employ Group Relative Policy Optimization (GRPO)-based Reinforcement Fine-Tuning with a Jaccard Reward function to enhance diagnostic reasoning. Experimental results demonstrate MedTVT-R1's superiority in multimodal feature utilization and multi-disease diagnosis, offering significant potential for clinical applications such as diagnostic report generation and comorbidity reasoning. The dataset and code are available at https://github.com/keke-nice/MedTVT-R1.