Unifying Heterogeneous Electronic Health Records Systems via Text-Based Code Embedding
This addresses interoperability issues in EHR systems for healthcare providers and researchers, enabling broader deployment of deep learning models across clinics and hospitals, though it is incremental as it builds on existing neural language models.
The paper tackled the lack of a unified code system in electronic health records (EHR) by introducing DescEmb, a text-based embedding framework that uses textual descriptions to represent clinical events, which outperformed traditional code-based methods in zero-shot transfer tasks and enabled a single model for heterogeneous datasets.
EHR systems lack a unified code system forrepresenting medical concepts, which acts asa barrier for the deployment of deep learningmodels in large scale to multiple clinics and hos-pitals. To overcome this problem, we introduceDescription-based Embedding,DescEmb, a code-agnostic representation learning framework forEHR. DescEmb takes advantage of the flexibil-ity of neural language understanding models toembed clinical events using their textual descrip-tions rather than directly mapping each event toa dedicated embedding. DescEmb outperformedtraditional code-based embedding in extensiveexperiments, especially in a zero-shot transfertask (one hospital to another), and was able totrain a single unified model for heterogeneousEHR datasets.