IRMar 12

Reproducible Synthetic Clinical Letters for Seizure Frequency Information Extraction

Yujian Gan, Stephen H. Barlow, Ben Holgate, Joe Davies, James T. Teo, Joel S. Winston, Mark P. Richardson

arXiv:2603.11407v19.3h-index: 4

Predicted impact top 55% in IR · last 90 daysOriginality Synthesis-oriented

AI Analysis

This addresses the problem of extracting critical seizure-frequency information for epilepsy research and clinical care without sharing sensitive patient data, representing a domain-specific incremental advance.

The researchers tackled the challenge of extracting seizure frequency from variable free-text clinic letters by developing a reproducible, privacy-preserving framework using synthetic letters, achieving micro-F1 scores up to 0.788 for fine-grained categories and 0.847 for pragmatic categories on real clinic letters.

Seizure-frequency information is important for epilepsy research and clinical care, but it is usually recorded in variable free-text clinic letters that are hard to annotate and share. We developed a reproducible, privacy-preserving framework for extracting seizure frequency using fully synthetic yet task-faithful epilepsy letters. We defined a structured label scheme covering common descriptions of seizure burden, including explicit rates, ranges, clusters, seizure-free intervals, unknown frequency, and explicit no-seizure statements. A teacher language model generated NHS-style synthetic letters paired with normalized labels, rationales, and evidence spans. We fine-tuned several open-weight language models (4B-14B parameters) on these synthetic letters to extract seizure frequency from full documents, comparing direct numeric prediction with structured label prediction and testing evidence-grounded outputs. On a clinician-checked held-out set of real clinic letters, models trained only on synthetic data generalized well, and structured labels consistently outperformed direct numeric regression. With 15,000 synthetic training letters, models achieved micro-F1 scores up to 0.788 for fine-grained categories and 0.847 for pragmatic categories; a medically oriented 4B model achieved 0.787 and 0.858, respectively. Evidence-grounded outputs also supported rapid clinical verification and error analysis. These results show that synthetic, structured, evidence-grounded supervision can enable robust seizure-frequency extraction without sharing sensitive patient text and may generalize to other temporally complex clinical information extraction tasks.

View on arXiv PDF

Similar