ML LGMar 21, 2015

Fast Imbalanced Classification of Healthcare Data with Missing Values

Talayeh Razzaghi, Oleg Roderick, Ilya Safro, Nick Marko

arXiv:1503.06250v14.023 citations

Originality Incremental advance

AI Analysis

This addresses classification challenges in healthcare data with missing values and imbalanced classes, but it is incremental as it builds on existing SVM and imputation methods.

The paper tackled the problem of biased predictive modeling in medical data with missing values by proposing a multilevel framework combining cost-sensitive SVM and expected maximization imputation, resulting in faster and more accurate classification on benchmark and real health datasets.

In medical domain, data features often contain missing values. This can create serious bias in the predictive modeling. Typical standard data mining methods often produce poor performance measures. In this paper, we propose a new method to simultaneously classify large datasets and reduce the effects of missing values. The proposed method is based on a multilevel framework of the cost-sensitive SVM and the expected maximization imputation method for missing values, which relies on iterated regression analyses. We compare classification results of multilevel SVM-based algorithms on public benchmark datasets with imbalanced classes and missing values as well as real data in health applications, and show that our multilevel SVM-based method produces fast, and more accurate and robust classification results.

View on arXiv PDF

Similar