A SHAP-based explainable multi-level stacking ensemble learning method for predicting the length of stay in acute stroke
This work addresses care planning for acute stroke patients, but it is incremental as it builds on existing ensemble methods with added interpretability.
The paper tackled the problem of predicting length of stay in acute stroke by developing an explainable multi-level stacking ensemble model, achieving an AUC of 0.824 for ischemic stroke and 0.843 for hemorrhagic stroke, with statistically significant improvement over logistic regression for ischemic stroke.
Length of stay (LOS) prediction in acute stroke is critical for improving care planning. Existing machine learning models have shown suboptimal predictive performance, limited generalisability, and have overlooked system-level factors. We aimed to enhance model efficiency, performance, and interpretability by refining predictors and developing an interpretable multi-level stacking ensemble model. Data were accessed from the biennial Stroke Foundation Acute Audit (2015, 2017, 2019, 2021) in Australia. Models were developed for ischaemic and haemorrhagic stroke separately. The outcome was prolonged LOS (the LOS above the 75th percentile). Candidate predictors (ischaemic: n=89; haemorrhagic: n=83) were categorised into patient, clinical, and system domains. Feature selection with correlation-based approaches was used to refine key predictors. The evaluation of models included discrimination (AUC), calibration curves, and interpretability (SHAP plots). In ischaemic stroke (N=12,575), prolonged LOS was >=9 days, compared to >=11 days in haemorrhagic stroke (N=1,970). The ensemble model achieved superior performance [AUC: 0.824 (95% CI: 0.801-0.846)] and statistically outperformed logistic regression [AUC: 0.805 (95% CI: 0.782-0.829); P=0.0004] for ischaemic. However, the model [AUC: 0.843 (95% CI: 0.790-0.895)] did not statistically outperform logistic regression [AUC: 0.828 (95% CI: 0.774-0.882); P=0.136] for haemorrhagic. SHAP analysis identified shared predictors for both types of stroke: rehabilitation assessment, urinary incontinence, stroke unit care, inability to walk independently, physiotherapy, and stroke care coordinators involvement. An explainable ensemble model effectively predicted the prolonged LOS in ischaemic stroke. Further validation in larger cohorts is needed for haemorrhagic stroke.