LG NEFeb 11, 2016

Medical Concept Representation Learning from Electronic Health Records and its Application on Heart Failure Prediction

Edward Choi, Andy Schuetz, Walter F. Stewart, Jimeng Sun

arXiv:1602.03686v213.4142 citations

Originality Incremental advance

AI Analysis

This addresses the need for better feature extraction in health analytics, though it is incremental as it builds on existing representation learning methods applied to a specific domain.

The paper tackled the problem of representing heterogeneous medical concepts from electronic health records by learning embeddings based on temporal co-occurrence patterns, resulting in up to 23% improvement in AUC for heart failure prediction.

Objective: To transform heterogeneous clinical data from electronic health records into clinically meaningful constructed features using data driven method that rely, in part, on temporal relations among data. Materials and Methods: The clinically meaningful representations of medical concepts and patients are the key for health analytic applications. Most of existing approaches directly construct features mapped to raw data (e.g., ICD or CPT codes), or utilize some ontology mapping such as SNOMED codes. However, none of the existing approaches leverage EHR data directly for learning such concept representation. We propose a new way to represent heterogeneous medical concepts (e.g., diagnoses, medications and procedures) based on co-occurrence patterns in longitudinal electronic health records. The intuition behind the method is to map medical concepts that are co-occuring closely in time to similar concept vectors so that their distance will be small. We also derive a simple method to construct patient vectors from the related medical concept vectors. Results: For qualitative evaluation, we study similar medical concepts across diagnosis, medication and procedure. In quantitative evaluation, our proposed representation significantly improves the predictive modeling performance for onset of heart failure (HF), where classification methods (e.g. logistic regression, neural network, support vector machine and K-nearest neighbors) achieve up to 23% improvement in area under the ROC curve (AUC) using this proposed representation. Conclusion: We proposed an effective method for patient and medical concept representation learning. The resulting representation can map relevant concepts together and also improves predictive modeling performance.

View on arXiv PDF

Similar