CVNov 27, 2024

FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models

Alice Heiman, Xiaoman Zhang, Emma Chen, Sung Eun Kim, Pranav Rajpurkar

arXiv:2411.18672v312.812 citationsh-index: 38Has CodeCVPR

Originality Incremental advance

AI Analysis

This addresses clinical reliability issues in medical AI by mitigating quantitative inaccuracies in radiology reports, representing a domain-specific incremental improvement.

The paper tackles the problem of measurement hallucinations in chest X-ray report generation models by introducing FactCheXcker, a modular framework that reduces hallucinations by 135.0% on average and improves 10 out of 11 models tested.

Medical vision-language models often struggle with generating accurate quantitative measurements in radiology reports, leading to hallucinations that undermine clinical reliability. We introduce FactCheXcker, a modular framework that de-hallucinates radiology report measurements by leveraging an improved query-code-update paradigm. Specifically, FactCheXcker employs specialized modules and the code generation capabilities of large language models to solve measurement queries generated based on the original report. After extracting measurable findings, the results are incorporated into an updated report. We evaluate FactCheXcker on endotracheal tube placement, which accounts for an average of 78% of report measurements, using the MIMIC-CXR dataset and 11 medical report-generation models. Our results show that FactCheXcker significantly reduces hallucinations, improves measurement precision, and maintains the quality of the original reports. Specifically, FactCheXcker improves the performance of 10/11 models and achieves an average improvement of 135.0% in reducing measurement hallucinations measured by mean absolute error. Code is available at https://github.com/rajpurkarlab/FactCheXcker.

View on arXiv PDF Code

Similar