LGJul 11, 2024
How Deep is your Guess? A Fresh Perspective on Deep Learning for Medical Time-Series ImputationLinglong Qian, Tao Wang, Jun Wang et al.
We present a comprehensive analysis of deep learning approaches for Electronic Health Record (EHR) time-series imputation, examining how architectural and framework biases combine to influence model performance. Our investigation reveals varying capabilities of deep imputers in capturing complex spatiotemporal dependencies within EHRs, and that model effectiveness depends on how its combined biases align with medical time-series characteristics. Our experimental evaluation challenges common assumptions about model complexity, demonstrating that larger models do not necessarily improve performance. Rather, carefully designed architectures can better capture the complex patterns inherent in clinical data. The study highlights the need for imputation approaches that prioritise clinically meaningful data reconstruction over statistical accuracy. Our experiments show imputation performance variations of up to 20\% based on preprocessing and implementation choices, emphasising the need for standardised benchmarking methodologies. Finally, we identify critical gaps between current deep imputation methods and medical requirements, highlighting the importance of integrating clinical insights to achieve more reliable imputation approaches for healthcare applications.
LGDec 27, 2023Code
CSAI: Conditional Self-Attention Imputation for Healthcare Time-seriesLinglong Qian, Joseph Arul Raj, Hugh Logan Ellis et al.
We introduce the Conditional Self-Attention Imputation (CSAI) model, a novel recurrent neural network architecture designed to address the challenges of complex missing data patterns in multivariate time series derived from hospital electronic health records (EHRs). CSAI extends state-of-the-art neural network-based imputation by introducing key modifications specific to EHR data: a) attention-based hidden state initialisation to capture both long- and short-range temporal dependencies prevalent in EHRs, b) domain-informed temporal decay to mimic clinical data recording patterns, and c) a non-uniform masking strategy that models non-random missingness by calibrating weights according to both temporal and cross-sectional data characteristics. Comprehensive evaluation across four EHR benchmark datasets demonstrates CSAI's effectiveness compared to state-of-the-art architectures in data restoration and downstream tasks. CSAI is integrated into PyPOTS, an open-source Python toolbox designed for machine learning tasks on partially observed time series. This work significantly advances the state of neural network imputation applied to EHRs by more closely aligning algorithmic imputation with clinical realities.