LGOct 28, 2021

On the explainability of hospitalization prediction on a large COVID-19 patient dataset

arXiv:2110.15002v13 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need for reliable and interpretable AI predictions in high-stakes healthcare decisions for COVID-19 patients, though it is incremental in applying existing methods to a new dataset.

The researchers tackled the problem of predicting hospitalization for COVID-19 patients using AI models on a large dataset of over 110,000 patients, achieving high performance metrics such as an average precision of 0.96-0.98 for non-hospitalized cases and 0.75-0.85 for hospitalized cases, but found significant variability in explainability results across models and scenarios.

We develop various AI models to predict hospitalization on a large (over 110$k$) cohort of COVID-19 positive-tested US patients, sourced from March 2020 to February 2021. Models range from Random Forest to Neural Network (NN) and Time Convolutional NN, where combination of the data modalities (tabular and time dependent) are performed at different stages (early vs. model fusion). Despite high data unbalance, the models reach average precision 0.96-0.98 (0.75-0.85), recall 0.96-0.98 (0.74-0.85), and $F_1$-score 0.97-0.98 (0.79-0.83) on the non-hospitalized (or hospitalized) class. Performances do not significantly drop even when selected lists of features are removed to study model adaptability to different scenarios. However, a systematic study of the SHAP feature importance values for the developed models in the different scenarios shows a large variability across models and use cases. This calls for even more complete studies on several explainability methods before their adoption in high-stakes scenarios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes