RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance
This work addresses the need for conversational AI tools in radiology to improve report quality and efficiency, representing an incremental advancement by adapting existing methods to a specialized domain.
The authors tackled the problem of generating and discussing clinically correct radiology reports from medical images by introducing RaDialog, a large vision-language model that integrates visual features and structured findings with a large language model, achieving state-of-the-art clinical correctness in report generation and demonstrating strong interactive capabilities such as correcting reports and answering questions.
Conversational AI tools that can generate and discuss clinically correct radiology reports for a given medical image have the potential to transform radiology. Such a human-in-the-loop radiology assistant could facilitate a collaborative diagnostic process, thus saving time and improving the quality of reports. Towards this goal, we introduce RaDialog, the first thoroughly evaluated and publicly available large vision-language model for radiology report generation and interactive dialog. RaDialog effectively integrates visual image features and structured pathology findings with a large language model (LLM) while simultaneously adapting it to a specialized domain using parameter-efficient fine-tuning. To keep the conversational abilities of the underlying LLM, we propose a comprehensive, semi-automatically labeled, image-grounded instruct dataset for chest X-ray radiology tasks. By training with this dataset, our method achieves state-of-the-art clinical correctness in report generation and shows impressive abilities in interactive tasks such as correcting reports and answering questions, serving as a foundational step toward clinical dialog systems. Our code is available on github: https://github.com/ChantalMP/RaDialog.