CV CLNov 25, 2024

LaB-RAG: Label Boosted Retrieval Augmented Generation for Radiology Report Generation

Steven Song, Anirudh Subramanyam, Irene Madejski, Robert L. Grossman

arXiv:2411.16523v25.26 citationsh-index: 2Has Code

Originality Incremental advance

AI Analysis

This work addresses radiology report generation for medical imaging, offering a novel approach that avoids fine-tuning large models, though it is incremental in combining existing techniques like RAG and label-based methods.

The authors tackled radiology report generation by proposing LaB-RAG, a method that uses image-derived labels with retrieval-augmented generation and pretrained LLMs, achieving better results than other retrieval-based methods and competitive performance with fine-tuned models on MIMIC-CXR and CheXpert Plus datasets.

In the current paradigm of image captioning, deep learning models are trained to generate text from image embeddings of latent features. We challenge the assumption that fine-tuning of large, bespoke models is required to improve model generation accuracy. Here we propose Label Boosted Retrieval Augmented Generation (LaB-RAG), a small-model-based approach to image captioning that leverages image descriptors in the form of categorical labels to boost standard retrieval augmented generation (RAG) with pretrained large language models (LLMs). We study our method in the context of radiology report generation (RRG) over MIMIC-CXR and CheXpert Plus. We argue that simple classification models combined with zero-shot embeddings can effectively transform X-rays into text-space as radiology-specific labels. In combination with standard RAG, we show that these derived text labels can be used with general-domain LLMs to generate radiology reports. Without ever training our generative language model or image embedding models specifically for the task, and without ever directly "showing" the LLM an X-ray, we demonstrate that LaB-RAG achieves better results across natural language and radiology language metrics compared with other retrieval-based RRG methods, while attaining competitive results compared to other fine-tuned vision-language RRG models. We further conduct extensive ablation experiments to better understand the components of LaB-RAG. Our results suggest broader compatibility and synergy with fine-tuned methods to further enhance RRG performance.

View on arXiv PDF Code

Similar