LGMLJun 30, 2023

The Effect of Balancing Methods on Model Behavior in Imbalanced Classification Problems

arXiv:2307.00157v113 citationsh-index: 35Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of biased model behavior due to balancing methods for researchers and practitioners in machine learning, offering an incremental improvement in analysis tools.

The study investigated how balancing methods affect model behavior in imbalanced classification, revealing significant changes that can bias models toward balanced distributions, and proposed a new performance gain plot for optimal method selection.

Imbalanced data poses a significant challenge in classification as model performance is affected by insufficient learning from minority classes. Balancing methods are often used to address this problem. However, such techniques can lead to problems such as overfitting or loss of information. This study addresses a more challenging aspect of balancing methods - their impact on model behavior. To capture these changes, Explainable Artificial Intelligence tools are used to compare models trained on datasets before and after balancing. In addition to the variable importance method, this study uses the partial dependence profile and accumulated local effects techniques. Real and simulated datasets are tested, and an open-source Python package edgaro is developed to facilitate this analysis. The results obtained show significant changes in model behavior due to balancing methods, which can lead to biased models toward a balanced distribution. These findings confirm that balancing analysis should go beyond model performance comparisons to achieve higher reliability of machine learning models. Therefore, we propose a new method performance gain plot for informed data balancing strategy to make an optimal selection of balancing method by analyzing the measure of change in model behavior versus performance gain.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes