LGCVOct 25, 2021

Kernel density estimation-based sampling for neural network classification

arXiv:2110.12644v15 citations
Originality Synthesis-oriented
AI Analysis

This addresses data imbalance issues for neural network practitioners, but is incremental as it benchmarks an existing sampling method.

The paper tackles the problem of imbalanced data in machine learning by comparing a kernel density estimation (KDE) sampling technique against two base methods for neural network classification, finding that KDE sampling produces the best performance on 6 out of 8 datasets.

Imbalanced data occurs in a wide range of scenarios. The skewed distribution of the target variable elicits bias in machine learning algorithms. One of the popular methods to combat imbalanced data is to artificially balance the data through resampling. In this paper, we compare the efficacy of a recently proposed kernel density estimation (KDE) sampling technique in the context of artificial neural networks. We benchmark the KDE sampling method against two base sampling techniques and perform comparative experiments using 8 datasets and 3 neural networks architectures. The results show that KDE sampling produces the best performance on 6 out of 8 datasets. However, it must be used with caution on image datasets. We conclude that KDE sampling is capable of significantly improving the performance of neural networks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes