LGMLNov 1, 2018

A latent topic model for mining heterogenous non-randomly missing electronic health records data

arXiv:1811.00464v1
Originality Incremental advance
AI Analysis

This work addresses computational challenges in EHR data mining for healthcare applications, but it is incremental as it builds on existing latent topic models and collaborative filtering techniques.

The authors tackled the problem of mining heterogeneous and non-randomly missing electronic health records (EHR) data by developing mixEHR, an unsupervised generative model that integrates collaborative filtering and latent topic models, and applied it to 12.8 million observations from the MIMIC dataset to reveal disease topics, impute missing data, and predict mortality, showing it outperforms previous methods.

Electronic health records (EHR) are rich heterogeneous collection of patient health information, whose broad adoption provides great opportunities for systematic health data mining. However, heterogeneous EHR data types and biased ascertainment impose computational challenges. Here, we present mixEHR, an unsupervised generative model integrating collaborative filtering and latent topic models, which jointly models the discrete distributions of data observation bias and actual data using latent disease-topic distributions. We apply mixEHR on 12.8 million phenotypic observations from the MIMIC dataset, and use it to reveal latent disease topics, interpret EHR results, impute missing data, and predict mortality in intensive care units. Using both simulation and real data, we show that mixEHR outperforms previous methods and reveals meaningful multi-disease insights.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes