Survey of Imbalanced Data Methodologies
This is an incremental survey that addresses the common problem of imbalanced data, particularly relevant for practitioners in fields like finance.
The paper reviewed and compared popular methodologies for handling imbalanced data, applying under-sampling and over-sampling techniques to various modeling algorithms on UCI and Keel datasets, and analyzed performance across class-imbalance methods, algorithms, and grid search criteria.
Imbalanced data set is a problem often found and well-studied in financial industry. In this paper, we reviewed and compared some popular methodologies handling data imbalance. We then applied the under-sampling/over-sampling methodologies to several modeling algorithms on UCI and Keel data sets. The performance was analyzed for class-imbalance methods, modeling algorithms and grid search criteria comparison.