Application of Clinical Concept Embeddings for Heart Failure Prediction in UK EHR data
This work addresses the problem of high-dimensional and heterogeneous EHR data for healthcare researchers and practitioners, but it is incremental as it applies an existing embedding method to a new dataset.
The authors tackled the challenge of feature engineering in electronic health records (EHR) by using GloVe to learn embeddings for diagnoses and procedures from 13 million ontology terms across 2.7 million hospitalizations in UK EHR data, and demonstrated their utility by predicting congestive heart failure risk, showing that embeddings can create robust disease risk prediction models.
Electronic health records (EHR) are increasingly being used for constructing disease risk prediction models. Feature engineering in EHR data however is challenging due to their highly dimensional and heterogeneous nature. Low-dimensional representations of EHR data can potentially mitigate these challenges. In this paper, we use global vectors (GloVe) to learn word embeddings for diagnoses and procedures recorded using 13 million ontology terms across 2.7 million hospitalisations in national UK EHR. We demonstrate the utility of these embeddings by evaluating their performance in identifying patients which are at higher risk of being hospitalised for congestive heart failure. Our findings indicate that embeddings can enable the creation of robust EHR-derived disease risk prediction models and address some the limitations associated with manual clinical feature engineering.