A Hybrid Approach for Binary Classification of Imbalanced Data
This addresses the problem of poor minority class accuracy in imbalanced data classification for machine learning practitioners, but it is incremental as it builds on existing hybrid and ensemble techniques.
The paper tackles binary classification on imbalanced datasets by proposing HADR, a hybrid approach combining data block construction, dimensionality reduction, and ensemble learning with deep neural networks, and reports that it outperforms state-of-the-art methods on eight public datasets in terms of recall, G-mean, and AUC.
Binary classification with an imbalanced dataset is challenging. Models tend to consider all samples as belonging to the majority class. Although existing solutions such as sampling methods, cost-sensitive methods, and ensemble learning methods improve the poor accuracy of the minority class, these methods are limited by overfitting problems or cost parameters that are difficult to decide. We propose HADR, a hybrid approach with dimension reduction that consists of data block construction, dimentionality reduction, and ensemble learning with deep neural network classifiers. We evaluate the performance on eight imbalanced public datasets in terms of recall, G-mean, and AUC. The results show that our model outperforms state-of-the-art methods.