Bootstrapping-based Regularisation for Reducing Individual Prediction Instability in Clinical Risk Prediction Models

arXiv:2602.11360v11.4

Originality Incremental advance

AI Analysis

This addresses the issue of unreliable predictions for clinical adoption in healthcare, particularly in data-limited settings, by providing a method that enhances robustness without sacrificing interpretability, though it is incremental as it builds on existing bootstrapping and regularisation techniques.

The paper tackled the problem of instability in deep learning-based clinical prediction models, where predictions vary across training samples, by proposing a bootstrapping-based regularisation framework that embeds bootstrapping into neural network training to constrain variability. The result showed improved prediction stability with lower mean absolute differences (e.g., 0.019 vs. 0.059 in GUSTO-I) and maintained discriminative performance and feature importance consistency.

Clinical prediction models are increasingly used to support patient care, yet many deep learning-based approaches remain unstable, as their predictions can vary substantially when trained on different samples from the same population. Such instability undermines reliability and limits clinical adoption. In this study, we propose a novel bootstrapping-based regularisation framework that embeds the bootstrapping process directly into the training of deep neural networks. This approach constrains prediction variability across resampled datasets, producing a single model with inherent stability properties. We evaluated models constructed using the proposed regularisation approach against conventional and ensemble models using simulated data and three clinical datasets: GUSTO-I, Framingham, and SUPPORT. Across all datasets, our model exhibited improved prediction stability, with lower mean absolute differences (e.g., 0.019 vs. 0.059 in GUSTO-I; 0.057 vs. 0.088 in Framingham) and markedly fewer significantly deviating predictions. Importantly, discriminative performance and feature importance consistency were maintained, with high SHAP correlations between models (e.g., 0.894 for GUSTO-I; 0.965 for Framingham). While ensemble models achieved greater stability, we show that this came at the expense of interpretability, as each constituent model used predictors in different ways. By regularising predictions to align with bootstrapped distributions, our approach allows prediction models to be developed that achieve greater robustness and reproducibility without sacrificing interpretability. This method provides a practical route toward more reliable and clinically trustworthy deep learning models, particularly valuable in data-limited healthcare settings.

View on arXiv PDF

Similar