LGAISPNov 3, 2025

Interpretable Heart Disease Prediction via a Weighted Ensemble Model: A Large-Scale Study with SHAP and Surrogate Decision Trees

arXiv:2511.01947v1h-index: 3
Originality Incremental advance
AI Analysis

It addresses the need for interpretable predictive models for early CVD risk assessment in public health, though it is incremental as it builds on existing methods.

This study tackled the problem of predicting cardiovascular disease risk by developing a weighted ensemble model combining tree-based methods and a CNN, achieving a test AUC of 0.8371 and 80.0% recall for improved screening.

Cardiovascular disease (CVD) remains a critical global health concern, demanding reliable and interpretable predictive models for early risk assessment. This study presents a large-scale analysis using the Heart Disease Health Indicators Dataset, developing a strategically weighted ensemble model that combines tree-based methods (LightGBM, XGBoost) with a Convolutional Neural Network (CNN) to predict CVD risk. The model was trained on a preprocessed dataset of 229,781 patients where the inherent class imbalance was managed through strategic weighting and feature engineering enhanced the original 22 features to 25. The final ensemble achieves a statistically significant improvement over the best individual model, with a Test AUC of 0.8371 (p=0.003) and is particularly suited for screening with a high recall of 80.0%. To provide transparency and clinical interpretability, surrogate decision trees and SHapley Additive exPlanations (SHAP) are used. The proposed model delivers a combination of robust predictive performance and clinical transparency by blending diverse learning architectures and incorporating explainability through SHAP and surrogate decision trees, making it a strong candidate for real-world deployment in public health screening.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes