CLAICVDec 2, 2024

Evaluating Automated Radiology Report Quality through Fine-Grained Phrasal Grounding of Clinical Findings

Berkeley
arXiv:2412.01031v33 citationsh-index: 68ISBI
Originality Incremental advance
AI Analysis

This addresses the need for more accurate quality assessment in AI-generated medical reports, though it is incremental as it builds on existing textual metrics by adding visual grounding.

The paper tackled the problem of automatically evaluating the quality of AI-generated radiology reports by developing a method that extracts fine-grained clinical finding patterns and grounds them to anatomical regions in chest radiographs, combining textual and visual measures to show robustness and sensitivity to factual errors on a MIMIC-derived dataset.

Several evaluation metrics have been developed recently to automatically assess the quality of generative AI reports for chest radiographs based only on textual information using lexical, semantic, or clinical named entity recognition methods. In this paper, we develop a new method of report quality evaluation by first extracting fine-grained finding patterns capturing the location, laterality, and severity of a large number of clinical findings. We then performed phrasal grounding to localize their associated anatomical regions on chest radiograph images. The textual and visual measures are then combined to rate the quality of the generated reports. We present results that compare this evaluation metric with other textual metrics on a gold standard dataset derived from the MIMIC collection and show its robustness and sensitivity to factual errors.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes