Quality Control for Radiology Report Generation Models via Auxiliary Auditing Components
This addresses the challenge of deploying automated radiology report generation in clinical practice by improving reliability assessment, though it appears incremental as it builds on existing disease-classifier methods.
The paper tackles the problem of ensuring clinical accuracy in AI-generated radiology reports by proposing a quality control framework using auxiliary auditing components, which identifies more reliable reports and achieves higher F1 scores compared to unfiltered reports.
Automation of medical image interpretation could alleviate bottlenecks in diagnostic workflows, and has become of particular interest in recent years due to advancements in natural language processing. Great strides have been made towards automated radiology report generation via AI, yet ensuring clinical accuracy in generated reports is a significant challenge, hindering deployment of such methods in clinical practice. In this work we propose a quality control framework for assessing the reliability of AI-generated radiology reports with respect to semantics of diagnostic importance using modular auxiliary auditing components (AC). Evaluating our pipeline on the MIMIC-CXR dataset, our findings show that incorporating ACs in the form of disease-classifiers can enable auditing that identifies more reliable reports, resulting in higher F1 scores compared to unfiltered generated reports. Additionally, leveraging the confidence of the AC labels further improves the audit's effectiveness.