Renaissance of RNNs in Streaming Clinical Time Series: Compact Recurrence Remains Competitive with Transformers
This work addresses model selection for real-time clinical monitoring, showing task-dependent performance, but it is incremental as it compares existing methods on a specific dataset without introducing new paradigms.
The study tackled the problem of streaming clinical time series analysis by comparing a GRU-D (RNN) and a Transformer on tasks like near-term tachycardia risk and heart rate forecasting using the MIT-BIH Arrhythmia Database, finding that GRU-D slightly outperformed the Transformer for risk scoring while the Transformer reduced forecasting error more effectively.
We present a compact, strictly causal benchmark for streaming clinical time series on the MIT--BIH Arrhythmia Database using per-second heart rate. Two tasks are studied under record-level, non-overlapping splits: near-term tachycardia risk (next ten seconds) and one-step heart rate forecasting. We compare a GRU-D (RNN) and a Transformer under matched training budgets against strong non-learned baselines. Evaluation is calibration-aware for classification and proper for forecasting, with temperature scaling and grouped bootstrap confidence intervals. On MIT-BIH, GRU-D slightly surpasses the Transformer for tachycardia risk, while the Transformer clearly lowers forecasting error relative to GRU-D and persistence. Our results show that, in longitudinal monitoring, model choice is task-dependent: compact RNNs remain competitive for short-horizon risk scoring, whereas compact Transformers deliver clearer gains for point forecasting.