Deep EHR: Chronic Disease Prediction Using Medical Notes
This work addresses the challenge of leveraging unstructured data in EHRs for chronic disease prediction, which is important for healthcare professionals and resource allocation, though it is incremental as it builds on existing deep learning methods.
The authors tackled the problem of early detection of preventable diseases by developing a multi-task framework that combines unstructured medical notes with structured EHR data, showing that models using text outperform those using only structured data and that handling negations and numerical values further improves performance on a cohort of about 1 million patients.
Early detection of preventable diseases is important for better disease management, improved inter-ventions, and more efficient health-care resource allocation. Various machine learning approacheshave been developed to utilize information in Electronic Health Record (EHR) for this task. Majorityof previous attempts, however, focus on structured fields and lose the vast amount of information inthe unstructured notes. In this work we propose a general multi-task framework for disease onsetprediction that combines both free-text medical notes and structured information. We compareperformance of different deep learning architectures including CNN, LSTM and hierarchical models.In contrast to traditional text-based prediction models, our approach does not require disease specificfeature engineering, and can handle negations and numerical values that exist in the text. Ourresults on a cohort of about 1 million patients show that models using text outperform modelsusing just structured data, and that models capable of using numerical values and negations in thetext, in addition to the raw text, further improve performance. Additionally, we compare differentvisualization methods for medical professionals to interpret model predictions.