LGMLOct 17, 2019

Kernel density estimation based sampling for imbalanced class distribution

arXiv:1910.07842v2139 citations
Originality Incremental advance
AI Analysis

This addresses the problem of imbalanced class distribution for data scientists in fields like fraud detection and medical diagnostics, offering an incremental improvement over existing sampling methods.

The paper tackled class imbalance in datasets by proposing a kernel density estimation (KDE) based sampling method for the minority class, and found that it outperformed other techniques on real-life datasets in terms of F1-score and G-mean, with consistent results across various classification algorithms and class distribution ratios.

Imbalanced response variable distribution is a common occurrence in data science. In fields such as fraud detection, medical diagnostics, system intrusion detection and many others where abnormal behavior is rarely observed the data under study often features disproportionate target class distribution. One common way to combat class imbalance is through resampling the minority class to achieve a more balanced distribution. In this paper, we investigate the performance of the sampling method based on kernel density estimation (KDE). We believe that KDE offers a more natural way of generating new instances of minority class that is less prone to overfitting than other standard sampling techniques. It is based on a well established theory of nonparametric statistical estimation. Numerical experiments show that KDE can outperform other sampling techniques on a range of real life datasets as measured by F1-score and G-mean. The results remain consistent across a number of classification algorithms used in the experiments. Furthermore, the proposed method outperforms the benchmark methods irregardless of the class distribution ratio. We conclude, based on the solid theoretical foundation and strong experimental results, that the proposed method would be a valuable tool in problems involving imbalanced class distribution.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes