RadAnnotate: Large Language Models for Efficient and Reliable Radiology Report Annotation

Saisha Pradeep Shetty, Roger Eric Goldman, Vladimir Filkov

arXiv:2603.1600227.6h-index: 3

AI Analysis

This work addresses the need for efficient and reliable annotation in clinical NLP, specifically for radiology reports, but is incremental as it focuses on entity labeling and leaves relation extraction for future work.

The paper tackled the problem of slow and costly manual annotation of radiology reports by developing RadAnnotate, an LLM-based framework that uses retrieval-augmented synthetic reports and confidence-based automation, achieving automatic annotation of 55-90% of reports with entity match scores of 0.86-0.92 and improving F1 from 0.61 to 0.70 for uncertain observations in low-resource settings.

Radiology report annotation is essential for clinical NLP, yet manual labeling is slow and costly. We present RadAnnotate, an LLM-based framework that studies retrieval-augmented synthetic reports and confidence-based selective automation to reduce expert effort for labeling in RadGraph. We study RadGraph-style entity labeling (graph nodes) and leave relation extraction (edges) to future work. First, we train entity-specific classifiers on gold-standard reports and characterize their strengths and failure modes across anatomy and observation categories, with uncertain observations hardest to learn. Second, we generate RAG-guided synthetic reports and show that synthetic-only models remain within 1-2 F1 points of gold-trained models, and that synthetic augmentation is especially helpful for uncertain observations in a low-resource setting, improving F1 from 0.61 to 0.70. Finally, by learning entity-specific confidence thresholds, RadAnnotate can automatically annotate 55-90% of reports at 0.86-0.92 entity match score while routing low-confidence cases for expert review.

View on arXiv PDF

Similar