ML LGFeb 9, 2021

Classification of Imbalanced Credit scoring data sets Based on Ensemble Method with the Weighted-Hybrid-Sampling

arXiv:2102.04721v1

Originality Incremental advance

AI Analysis

This work is significant for banks and financial institutions, as it aims to improve the accuracy of credit risk assessment by better classifying minority classes in imbalanced credit scoring datasets, potentially reducing commercial harm.

The paper addresses the challenge of imbalanced credit scoring datasets, where conventional machine learning models perform poorly on the minority class. The authors propose WHSBoost, a new ensemble algorithm that uses Weighted-SMOTE and Weighted-Under-Sampling to balance datasets, achieving improved classification for the minority class.

In the era of big data, the utilization of credit-scoring models to determine the credit risk of applicants accurately becomes a trend in the future. The conventional machine learning on credit scoring data sets tends to have poor classification for the minority class, which may bring huge commercial harm to banks. In order to classify imbalanced data sets, we propose a new ensemble algorithm, namely, Weighted-Hybrid-Sampling-Boost (WHSBoost). In data sampling, we process the imbalanced data sets with weights by the Weighted-SMOTE method and the Weighted-Under-Sampling method, and thus obtain a balanced training sample data set with equal weight. In ensemble algorithm, each time we train the base classifier, the balanced data set is given by the method above. In order to verify the applicability and robustness of the WHSBoost algorithm, we performed experiments on the simulation data sets, real benchmark data sets and real credit scoring data sets, comparing WHSBoost with SMOTE, SMOTEBoost and HSBoost based on SVM, BPNN, DT and KNN.

View on arXiv PDF

Similar