AIJul 27, 2023
Fact-Checking of AI-Generated ReportsRazi Mahmood, Diego Machado Reyes, Ge Wang et al. · berkeley
With advances in generative artificial intelligence (AI), it is now possible to produce realistic-looking automated reports for preliminary reads of radiology images. This can expedite clinical workflows, improve accuracy and reduce overall costs. However, it is also well-known that such models often hallucinate, leading to false findings in the generated reports. In this paper, we propose a new method of fact-checking of AI-generated reports using their associated images. Specifically, the developed examiner differentiates real and fake sentences in reports by learning the association between an image and sentences describing real or potentially fake findings. To train such an examiner, we first created a new dataset of fake reports by perturbing the findings in the original ground truth radiology reports associated with images. Text encodings of real and fake sentences drawn from these reports are then paired with image encodings to learn the mapping to real/fake labels. The utility of such an examiner is demonstrated for verifying automatically generated reports by detecting and removing fake sentences. Future generative AI approaches can use the resulting tool to validate their reports leading to a more responsible use of AI in expediting clinical workflows.
CVSep 20, 2025
Phrase-grounded Fact-checking for Automatically Generated Chest X-ray ReportsRazi Mahmood, Diego Machado-Reyes, Joy Wu et al. · berkeley
With the emergence of large-scale vision language models (VLM), it is now possible to produce realistic-looking radiology reports for chest X-ray images. However, their clinical translation has been hampered by the factual errors and hallucinations in the produced descriptions during inference. In this paper, we present a novel phrase-grounded fact-checking model (FC model) that detects errors in findings and their indicated locations in automatically generated chest radiology reports. Specifically, we simulate the errors in reports through a large synthetic dataset derived by perturbing findings and their locations in ground truth reports to form real and fake findings-location pairs with images. A new multi-label cross-modal contrastive regression network is then trained on this dataset. We present results demonstrating the robustness of our method in terms of accuracy of finding veracity prediction and localization on multiple X-ray datasets. We also show its effectiveness for error detection in reports of SOTA report generators on multiple datasets achieving a concordance correlation coefficient of 0.997 with ground truth-based verification, thus pointing to its utility during clinical inference in radiology workflows.
IVOct 17, 2024
Scalable Drift Monitoring in Medical Imaging AIJameson Merkow, Felix J. Dorfner, Xiyu Yang et al.
The integration of artificial intelligence (AI) into medical imaging has advanced clinical diagnostics but poses challenges in managing model drift and ensuring long-term reliability. To address these challenges, we develop MMC+, an enhanced framework for scalable drift monitoring, building upon the CheXstray framework that introduced real-time drift detection for medical imaging AI models using multi-modal data concordance. This work extends the original framework's methodologies, providing a more scalable and adaptable solution for real-world healthcare settings and offers a reliable and cost-effective alternative to continuous performance monitoring addressing limitations of both continuous and periodic monitoring methods. MMC+ introduces critical improvements to the original framework, including more robust handling of diverse data streams, improved scalability with the integration of foundation models like MedImageInsight for high-dimensional image embeddings without site-specific training, and the introduction of uncertainty bounds to better capture drift in dynamic clinical environments. Validated with real-world data from Massachusetts General Hospital during the COVID-19 pandemic, MMC+ effectively detects significant data shifts and correlates them with model performance changes. While not directly predicting performance degradation, MMC+ serves as an early warning system, indicating when AI systems may deviate from acceptable performance bounds and enabling timely interventions. By emphasizing the importance of monitoring diverse data streams and evaluating data shifts alongside model performance, this work contributes to the broader adoption and integration of AI solutions in clinical settings.
CVOct 30, 2018
Shape and Margin-Aware Lung Nodule Classification in Low-dose CT Images via Soft Activation MappingYiming Lei, Yukun Tian, Hongming Shan et al.
A number of studies on lung nodule classification lack clinical/biological interpretations of the features extracted by convolutional neural network (CNN). The methods like class activation mapping (CAM) and gradient-based CAM (Grad-CAM) are tailored for interpreting localization and classification tasks while they ignored fine-grained features. Therefore, CAM and Grad-CAM cannot provide optimal interpretation for lung nodule categorization task in low-dose CT images, in that fine-grained pathological clues like discrete and irregular shape and margins of nodules are capable of enhancing sensitivity and specificity of nodule classification with regards to CNN. In this paper, we first develop a soft activation mapping (SAM) to enable fine-grained lung nodule shape \& margin (LNSM) feature analysis with a CNN so that it can access rich discrete features. Secondly, by combining high-level convolutional features with SAM, we further propose a high-level feature enhancement scheme (HESAM) to localize LNSM features. Experiments on the LIDC-IDRI dataset indicate that 1) SAM captures more fine-grained and discrete attention regions than existing methods, 2) HESAM localizes more accurately on LNSM features and obtains the state-of-the-art predictive performance, reducing the false positive rate, and 3) we design and conduct a visually matching experiment which incorporates radiologists study to increase the confidence level of applying our method to clinical diagnosis.