Semantic NLP Pipelines for Interoperable Patient Digital Twins from Unstructured EHRs
This work addresses the problem of inconsistent clinical documentation for healthcare professionals, though it appears incremental as it builds on existing NLP techniques for a specific domain application.
The paper tackled the challenge of generating interoperable patient digital twins from unstructured electronic health records by developing a semantic NLP-driven pipeline that transforms free-text EHR notes into FHIR-compliant representations, achieving high F1-scores for entity and relation extraction with improved schema completeness and interoperability compared to baseline methods.
Digital twins -- virtual replicas of physical entities -- are gaining traction in healthcare for personalized monitoring, predictive modeling, and clinical decision support. However, generating interoperable patient digital twins from unstructured electronic health records (EHRs) remains challenging due to variability in clinical documentation and lack of standardized mappings. This paper presents a semantic NLP-driven pipeline that transforms free-text EHR notes into FHIR-compliant digital twin representations. The pipeline leverages named entity recognition (NER) to extract clinical concepts, concept normalization to map entities to SNOMED-CT or ICD-10, and relation extraction to capture structured associations between conditions, medications, and observations. Evaluation on MIMIC-IV Clinical Database Demo with validation against MIMIC-IV-on-FHIR reference mappings demonstrates high F1-scores for entity and relation extraction, with improved schema completeness and interoperability compared to baseline methods.