Caveats in Generating Medical Imaging Labels from Radiology Reports
This work highlights critical limitations in using radiology reports for automated medical imaging annotation, which is an incremental but important caution for researchers and practitioners in medical AI.
The study investigated the discrepancy between radiologists' visual findings in chest X-rays and their clinical reports, revealing significant inconsistencies that undermine the reliability of using these reports as ground truth for automatic label extraction, with state-of-the-art NLP methods failing to produce high-fidelity labels.
Acquiring high-quality annotations in medical imaging is usually a costly process. Automatic label extraction with natural language processing (NLP) has emerged as a promising workaround to bypass the need of expert annotation. Despite the convenience, the limitation of such an approximation has not been carefully examined and is not well understood. With a challenging set of 1,000 chest X-ray studies and their corresponding radiology reports, we show that there exists a surprisingly large discrepancy between what radiologists visually perceive and what they clinically report. Furthermore, with inherently flawed report as ground truth, the state-of-the-art medical NLP fails to produce high-fidelity labels.