CVAICLAug 19, 2024

R2GenCSR: Mining Contextual and Residual Information for LLMs-based Radiology Report Generation

arXiv:2408.09743v220 citationsh-index: 12Has Code
Originality Incremental advance
AI Analysis

This work addresses computational efficiency and feature enhancement for radiology report generation, which is an incremental improvement for medical AI applications.

The paper tackled the problem of extracting more effective information for LLMs in radiology report generation while reducing computational complexity, achieving comparable performance to strong Transformer models with linear complexity and validating effectiveness on three X-ray datasets.

Inspired by the tremendous success of Large Language Models (LLMs), existing Radiology report generation methods attempt to leverage large models to achieve better performance. They usually adopt a Transformer to extract the visual features of a given X-ray image, and then, feed them into the LLM for text generation. How to extract more effective information for the LLMs to help them improve final results is an urgent problem that needs to be solved. Additionally, the use of visual Transformer models also brings high computational complexity. To address these issues, this paper proposes a novel context-guided efficient radiology report generation framework. Specifically, we introduce the Mamba as the vision backbone with linear complexity, and the performance obtained is comparable to that of the strong Transformer model. More importantly, we perform context retrieval from the training set for samples within each mini-batch during the training phase, utilizing both positively and negatively related samples to enhance feature representation and discriminative learning. Subsequently, we feed the vision tokens, context information, and prompt statements to invoke the LLM for generating high-quality medical reports. Extensive experiments on three X-ray report generation datasets (i.e., IU X-Ray, MIMIC-CXR, CheXpert Plus) fully validated the effectiveness of our proposed model. The source code is available at https://github.com/Event-AHU/Medical_Image_Analysis.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes