Extracting Radiological Findings With Normalized Anatomical Information Using a Span-Based BERT Relation Extraction Model
This work addresses the need for large-scale structured data from medical imaging reports to aid in diagnosis and treatment, particularly for conditions like cancer, but it is incremental as it builds on existing BERT-based methods for a specific domain task.
The paper tackled the problem of extracting and normalizing anatomical information from unstructured radiology reports to create structured semantic representations, using a span-based BERT relation extraction model, and examined factors like body part and span length that influence performance.
Medical imaging is critical to the diagnosis and treatment of numerous medical problems, including many forms of cancer. Medical imaging reports distill the findings and observations of radiologists, creating an unstructured textual representation of unstructured medical images. Large-scale use of this text-encoded information requires converting the unstructured text to a structured, semantic representation. We explore the extraction and normalization of anatomical information in radiology reports that is associated with radiological findings. We investigate this extraction and normalization task using a span-based relation extraction model that jointly extracts entities and relations using BERT. This work examines the factors that influence extraction and normalization performance, including the body part/organ system, frequency of occurrence, span length, and span diversity. It discusses approaches for improving performance and creating high-quality semantic representations of radiological phenomena.