CL AI CVDec 2, 2024

Evaluating Automated Radiology Report Quality through Fine-Grained Phrasal Grounding of Clinical Findings

Razi Mahmood, Pingkun Yan, Diego Machado Reyes, Ge Wang, Mannudeep K. Kalra, Parisa Kaviani, Joy T. Wu, Tanveer Syeda-Mahmood

Berkeley

arXiv:2412.01031v31.93 citationsh-index: 86ISBI

Originality Incremental advance

AI Analysis

This addresses the need for more accurate quality assessment in AI-generated medical reports, though it is incremental as it builds on existing textual metrics by adding visual grounding.

The paper tackled the problem of automatically evaluating the quality of AI-generated radiology reports by developing a method that extracts fine-grained clinical finding patterns and grounds them to anatomical regions in chest radiographs, combining textual and visual measures to show robustness and sensitivity to factual errors on a MIMIC-derived dataset.

Several evaluation metrics have been developed recently to automatically assess the quality of generative AI reports for chest radiographs based only on textual information using lexical, semantic, or clinical named entity recognition methods. In this paper, we develop a new method of report quality evaluation by first extracting fine-grained finding patterns capturing the location, laterality, and severity of a large number of clinical findings. We then performed phrasal grounding to localize their associated anatomical regions on chest radiograph images. The textual and visual measures are then combined to rate the quality of the generated reports. We present results that compare this evaluation metric with other textual metrics on a gold standard dataset derived from the MIMIC collection and show its robustness and sensitivity to factual errors.

View on arXiv PDF

Similar