Learning compressed representations of blood samples time series with missing data
This work addresses a domain-specific challenge in clinical data analysis by improving representation learning for time series with missing values, though it is incremental as it builds on existing autoencoder and kernel methods.
The authors tackled the problem of learning compressed representations from multivariate time series with missing data, specifically for blood samples of patients with surgical site infection, and achieved improved classification performance compared to a standard autoencoder.
Clinical measurements collected over time are naturally represented as multivariate time series (MTS), which often contain missing data. An autoencoder can learn low dimensional vectorial representations of MTS that preserve important data characteristics, but cannot deal explicitly with missing data. In this work, we propose a new framework that combines an autoencoder with the Time series Cluster Kernel (TCK), a kernel that accounts for missingness patterns in MTS. Via kernel alignment, we incorporate TCK in the autoencoder to improve the learned representations in presence of missing data. We consider a classification problem of MTS with missing values, representing blood samples of patients with surgical site infection. With our approach, rather than with a standard autoencoder, we learn representations in low dimensions that can be classified better.