LG AI MLOct 4, 2019

Unsupervised Representation for EHR Signals and Codes as Patient Status Vector

Sajad Darabi, Mohammad Kachuee, Majid Sarrafzadeh

arXiv:1910.01803v15.45 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of effective EHR modeling for healthcare applications, representing an incremental improvement in unsupervised learning methods for this domain.

The paper tackled the challenge of modeling irregular electronic health records by developing an unsupervised two-step representation learning scheme to summarize multi-modal clinical data into a patient status vector, resulting in improved generalization performance for mortality and readmission prediction tasks in ICU visits.

Effective modeling of electronic health records presents many challenges as they contain large amounts of irregularity most of which are due to the varying procedures and diagnosis a patient may have. Despite the recent progress in machine learning, unsupervised learning remains largely at open, especially in the healthcare domain. In this work, we present a two-step unsupervised representation learning scheme to summarize the multi-modal clinical time series consisting of signals and medical codes into a patient status vector. First, an auto-encoder step is used to reduce sparse medical codes and clinical time series into a distributed representation. Subsequently, the concatenation of the distributed representations is further fine-tuned using a forecasting task. We evaluate the usefulness of the representation on two downstream tasks: mortality and readmission. Our proposed method shows improved generalization performance for both short duration ICU visits and long duration ICU visits.

View on arXiv PDF

Similar