CSAI: Conditional Self-Attention Imputation for Healthcare Time-series
This work advances neural network imputation for EHRs by better aligning algorithmic methods with clinical realities, addressing a domain-specific problem for healthcare data analysis.
The paper tackles the problem of complex missing data patterns in multivariate time series from hospital electronic health records by introducing the Conditional Self-Attention Imputation (CSAI) model, which demonstrates effectiveness in data restoration and downstream tasks across four EHR benchmark datasets.
We introduce the Conditional Self-Attention Imputation (CSAI) model, a novel recurrent neural network architecture designed to address the challenges of complex missing data patterns in multivariate time series derived from hospital electronic health records (EHRs). CSAI extends state-of-the-art neural network-based imputation by introducing key modifications specific to EHR data: a) attention-based hidden state initialisation to capture both long- and short-range temporal dependencies prevalent in EHRs, b) domain-informed temporal decay to mimic clinical data recording patterns, and c) a non-uniform masking strategy that models non-random missingness by calibrating weights according to both temporal and cross-sectional data characteristics. Comprehensive evaluation across four EHR benchmark datasets demonstrates CSAI's effectiveness compared to state-of-the-art architectures in data restoration and downstream tasks. CSAI is integrated into PyPOTS, an open-source Python toolbox designed for machine learning tasks on partially observed time series. This work significantly advances the state of neural network imputation applied to EHRs by more closely aligning algorithmic imputation with clinical realities.