CL LGNov 14, 2022

Semantic Decomposition Improves Learning of Large Language Models on EHR Data

David A. Bloore, Romane Gauriau, Anna L. Decker, Jacob Oppenheim

arXiv:2212.06040v10.61 citationsh-index: 8

Originality Incremental advance

AI Analysis

This work addresses the problem of extracting actionable insights from noisy EHR data for healthcare applications, representing an incremental advancement by building on existing BERT and GAT methods.

The researchers tackled the challenge of learning patterns from irregular, semi-structured electronic health records (EHR) by decomposing medical codes into semantic units with hierarchical graphs, resulting in significant improvements in predicting over 500 medical diagnosis classes as measured by aggregated AUC and APS.

Electronic health records (EHR) are widely believed to hold a profusion of actionable insights, encrypted in an irregular, semi-structured format, amidst a loud noise background. To simplify learning patterns of health and disease, medical codes in EHR can be decomposed into semantic units connected by hierarchical graphs. Building on earlier synergy between Bidirectional Encoder Representations from Transformers (BERT) and Graph Attention Networks (GAT), we present H-BERT, which ingests complete graph tree expansions of hierarchical medical codes as opposed to only ingesting the leaves and pushes patient-level labels down to each visit. This methodology significantly improves prediction of patient membership in over 500 medical diagnosis classes as measured by aggregated AUC and APS, and creates distinct representations of patients in closely related but clinically distinct phenotypes.

View on arXiv PDF

Similar