Uncertainty-Aware Structured Data Extraction from Full CMR Reports via Distilled LLMs
Enables automated, auditable extraction from clinical CMR reports for cohort assembly and decision support, with integrated uncertainty to triage human review.
CMR-EXTR converts free-text cardiac MRI reports into structured data with per-field confidence scores, achieving 99.65% variable-level accuracy via a teacher-student distillation pipeline.
Converting free-text cardiac magnetic resonance (CMR) reports into auditable structured data remains a bottleneck for cohort assembly, longitudinal curation, and clinical decision support. We present CMR-EXTR, a lightweight framework that converts free-text CMR reports into structured data and assigns per-field confidence for quality control. A teacher-student distillation pipeline enables fully offline inference while limiting manual annotation. Uncertainty integrates three complementary principles -- distribution plausibility, sampling stability, and cross-field consistency -- to triage human review. Experiments show that CMR-EXTR achieves 99.65% variable-level accuracy, demonstrating both reliable extraction and informative confidence scores. To our knowledge, this is the first CMR-specific extraction system with integrated confidence estimation. The code is available at https://github.com/yuyi1005/CMR-EXTR.