ML LGMar 13, 2019

Predicting class-imbalanced business risk using resampling, regularization, and model ensembling algorithms

arXiv:1903.05535v12.210 citations

Originality Synthesis-oriented

AI Analysis

This work addresses business risk prediction for stakeholders dealing with imbalanced data, but it is incremental as it applies existing methods without introducing new algorithms.

The paper tackled the problem of class-imbalanced business risk modeling by combining resampling, regularization, and ensembling techniques, achieving an AUC of 0.8633, recall of 0.9260, and F1 score of 0.8907 with a Boosting on decision tree model using SMOTE oversampling.

We aim at developing and improving the imbalanced business risk modeling via jointly using proper evaluation criteria, resampling, cross-validation, classifier regularization, and ensembling techniques. Area Under the Receiver Operating Characteristic Curve (AUC of ROC) is used for model comparison based on 10-fold cross validation. Two undersampling strategies including random undersampling (RUS) and cluster centroid undersampling (CCUS), as well as two oversampling methods including random oversampling (ROS) and Synthetic Minority Oversampling Technique (SMOTE), are applied. Three highly interpretable classifiers, including logistic regression without regularization (LR), L1-regularized LR (L1LR), and decision tree (DT) are implemented. Two ensembling techniques, including Bagging and Boosting, are applied on the DT classifier for further model improvement. The results show that, Boosting on DT by using the oversampled data containing 50% positives via SMOTE is the optimal model and it can achieve AUC, recall, and F1 score valued 0.8633, 0.9260, and 0.8907, respectively.

View on arXiv PDF

Similar