Multi-task Prediction of Disease Onsets from Longitudinal Lab Tests
This work addresses patient risk stratification in healthcare by enabling disease prediction from raw lab data, though it is incremental as it applies existing neural network methods to a new domain.
The authors tackled the problem of predicting disease onsets from longitudinal lab tests by training LSTM and novel CNNs for 133 conditions using data from 298K patients, finding that these neural networks significantly outperformed a logistic regression baseline.
Disparate areas of machine learning have benefited from models that can take raw data with little preprocessing as input and learn rich representations of that raw data in order to perform well on a given prediction task. We evaluate this approach in healthcare by using longitudinal measurements of lab tests, one of the more raw signals of a patient's health state widely available in clinical data, to predict disease onsets. In particular, we train a Long Short-Term Memory (LSTM) recurrent neural network and two novel convolutional neural networks for multi-task prediction of disease onset for 133 conditions based on 18 common lab tests measured over time in a cohort of 298K patients derived from 8 years of administrative claims data. We compare the neural networks to a logistic regression with several hand-engineered, clinically relevant features. We find that the representation-based learning approaches significantly outperform this baseline. We believe that our work suggests a new avenue for patient risk stratification based solely on lab results.