LGMLMay 7, 2020

Predictive Modeling of ICU Healthcare-Associated Infections from Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling Approach

arXiv:2005.03582v129 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of early infection detection for ICU patients and healthcare systems, but it is incremental as it builds on existing methods for handling imbalanced data.

The study tackled the challenge of predicting healthcare-associated infections in ICU patients from imbalanced data, proposing a clustering-based undersampling strategy with ensemble classifiers, which outperformed other methods in a comparative analysis on a dataset of 4616 patients.

Early detection of patients vulnerable to infections acquired in the hospital environment is a challenge in current health systems given the impact that such infections have on patient mortality and healthcare costs. This work is focused on both the identification of risk factors and the prediction of healthcare-associated infections in intensive-care units by means of machine-learning methods. The aim is to support decision making addressed at reducing the incidence rate of infections. In this field, it is necessary to deal with the problem of building reliable classifiers from imbalanced datasets. We propose a clustering-based undersampling strategy to be used in combination with ensemble classifiers. A comparative study with data from 4616 patients was conducted in order to validate our proposal. We applied several single and ensemble classifiers both to the original dataset and to data preprocessed by means of different resampling methods. The results were analyzed by means of classic and recent metrics specifically designed for imbalanced data classification. They revealed that the proposal is more efficient in comparison with other approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes